Locals merge based on specific pattern

Lets assume we have two locals:

    "azure_projects": {
      "value": [
        {
          "original": "MY-SUB-EX-P-SEQ00480-PRD-PUB-LCFS-DEF",
          "project_name": "AS-EX-P-SEQ00480-PRD-PUB-LCFS-DEF"
        },
        {
          "original": "MY-SUB-EX-P-SEQ00482-PRD-PUB-PAMP-TA1",
          "project_name": "AS-EX-P-SEQ00482-PRD-PUB-PAMP-TA1"
        },
        {
          "original": "MY-SUB-EX-P-SEQ00484-SEE-DP-Fundamentals",
          "project_name": "AS-EX-P-SEQ00484-SEE-DP-Fundamentals"
        }
      ]
    }

    "azure_contributors": {
      "value": [
        "MY-GRP-EX-P-SEQ00480-Contributor",
        "MY-GRP-EX-P-SEQ00481-Contributor",
        "MY-GRP-EX-P-SEQ00482-Contributor",
        "MY-GRP-EX-P-SEQ00483-Contributor",
        "MY-GRP-EX-P-SEQ00484-Contributor"
      ]
    }

My goal is to marry up these tuples where the sequence number extracted (i.e. (SEQ\\d{5,}) for azure_contributors matches the sequence number of the project_name in azure_projects. The conditional logic in Terraform looks a bit anaemic where if type of clauses are boolean return types only, challenge is once you use if in a for expression, then the result cannot be a tuple but instead a map, which renders further for expressions not possible to my knowledge. Further, substr nor split wonā€™t work as thee strings outside of this example vary in length and delimiter (-) count.

To desired result to look something like below:

 "new_local": {
      "value": [
        {
          "contributor": "MY-GRP-EX-P-SEQ00480-Contributor",
          "project_name": "AS-EX-P-SEQ00480-PRD-PUB-LCFS-DEF"
        },
        {
          "contributor": "MY-GRP-EX-P-SEQ00482-Contributor",
          "project_name": "AS-EX-P-SEQ00482-PRD-PUB-PAMP-TA1"
        },
        {
          "contributor": ""MY-GRP-EX-P-SEQ00484-Contributor"",
          "project_name": "AS-EX-P-SEQ00484-SEE-DP-Fundamentals"
        }
      ]
    }

Iā€™ve tried a bit already, maybe something like merge can help, going to test that out and report back

Getting closer

  azure_projects_extended = merge([for ap in local.azure_projects : {
    for c in local.azure_contributors : c => {
      project_tocreate = ap.project_name
    }

    if length(regexall(".* SEQ00484.*", ap.project_name)) > 0 && length(regexall(".*SEQ00484.*", c)) > 0
    }
    ]
  ...)

produces:

   "azure_projects_extended": {
      "value": {
        "MY-GRP-EX-P-SEQ00484-Contributor": {
          "project_tocreate": "AS-EX-P-SEQ00484-SEE-DP-Fundamentals"
        }
      },

was helpful to know that I could do a bit more with a if statement on the outer block than previously thought, last thing is getting the sequence extracted instead of the hard-coded test of .*SEQ00480.*, hopefully i can do that ā€˜in-lineā€™ā€¦

Ok nice looks like below works :slight_smile:

  azure_projects_extended = merge([for ap in local.azure_projects : {
    for c in local.azure_contributors : c => {
      project_tocreate = ap.project_name
    }
    if try(regex("(SEQ\\d{2,})", c), [c])[0] == try(regex("(SEQ\\d{2,})", ap.project_name), [ap])[0]
    }
    ]
  ...)
}
1 Like

Hi @jollyranger! It sounds like you already found a working solution so my reply here is just to share another possible way to do it. I donā€™t think either of these is necessarily better than the other but this way is just the way I would approach it.

I tend to like to approach problems like this by breaking them down into smaller steps that are easier to express as a hopefully-intelligible smaller expression. In this case I see three potential smaller steps:

  1. Project the ā€œAzure projectsā€ so that they are in a map data structure with the sequence numbers as keys.
  2. Project the ā€œAzure contributorsā€ so that they are in a map data structure with the sequence numbers as keys.
  3. Zip the two maps together by those common keys to produce a single data structure.

Iā€™m assuming from your example that for the ā€œAzure projectsā€ either the ā€œoriginalā€ and ā€œproject nameā€ will always have the same sequence number or the ā€œproject nameā€'s sequence number is the important one; Iā€™m going to implement with that assumption in mind but hopefully you can see how to adapt this if that isnā€™t a correct assumption.

locals {
  projects_by_seq = tomap({
    for proj in local.azure_projects :
    regex("SEQ\\d{5,}", proj.project_name) => proj
  })
  contributors_by_seq = tomap({
    for name in local.azure_contributors :
    regex("SEQ\\d{5,}", name) => {
      name = name
    }
  })
}

The above completes the first two steps, giving data structures like this:

projects_by_seq = tomap({
  "SEQ00480" = {
    original     = "MY-SUB-EX-P-SEQ00480-PRD-PUB-LCFS-DEF"
    project_name = "AS-EX-P-SEQ00480-PRD-PUB-LCFS-DEF"
  }
  "SEQ00482" = {
    original     = "MY-SUB-EX-P-SEQ00482-PRD-PUB-PAMP-TA1"
    project_name = "AS-EX-P-SEQ00482-PRD-PUB-PAMP-TA1"
  }
  "SEQ00484" = {
    original     = "MY-SUB-EX-P-SEQ00484-SEE-DP-Fundamentals"
    project_name = "AS-EX-P-SEQ00484-SEE-DP-Fundamentals"
  }
})
contributors_by_seq = tomap({
  "SEQ00480" = {
    name = "MY-GRP-EX-P-SEQ00480-Contributor"
  }
  "SEQ00481" = {
    name = "MY-GRP-EX-P-SEQ00481-Contributor"
  }
  "SEQ00482" = {
    name = "MY-GRP-EX-P-SEQ00482-Contributor"
  }
  "SEQ00483" = {
    name = "MY-GRP-EX-P-SEQ00483-Contributor"
  }
  "SEQ00484" = {
    name = "MY-GRP-EX-P-SEQ00484-Contributor"
  }
})

Another assumption Iā€™ve made from your example is that we should ignore any contributors whose sequence key does not appear in any project, and that if a project has no contributor then we should set the contributor name to null. With those assumptions in mind, hereā€™s step 3:

locals {
  project_contributors = toset([
    for k, proj in local.projects_by_seq : {
      project_name = proj.project_name
      contributor  = try(local.contributors_by_seq[k].name, null)
    }
  ])
}

I expect that this would produce a data structure like the one you showed in your example:

project_contributors = toset([
  {
    contributor  = "MY-GRP-EX-P-SEQ00480-Contributor"
    project_name = "AS-EX-P-SEQ00480-PRD-PUB-LCFS-DEF"
  },
  {
    contributor  = "MY-GRP-EX-P-SEQ00482-Contributor"
    project_name = "AS-EX-P-SEQ00482-PRD-PUB-PAMP-TA1"
  },
  {
    contributor  = "MY-GRP-EX-P-SEQ00484-Contributor"
    project_name = "AS-EX-P-SEQ00484-SEE-DP-Fundamentals"
  },
])

A lot of solutions in Terraform come down to choosing the most appropriate data structure for the work you want to do, projecting the data into that structure, and then using the intermediate data structure to get the final result. I chose to use maps for the intermediate data structures here because your requirement was to group things together by strings and that seems like a ā€œmap-type problemā€.

Continuing the theme of selecting the most appropriate data type, I also made the final data structure be a set of objects rather than a list as you illustrated, because this process of first grouping by sequence key and then zipping together has effectively lost the original order of projects, and a set data type communicates that these items are not in any particular order, whereas a list implies that the order is meaningful in some way. (If you did use a list here then theyā€™d be ordered by the map keys, meaning that theyā€™d be ordered by the sequence key. If thatā€™s a suitable order then you could use tolist instead of toset to get that result.)

Thanks! I think your approach is very sensible. One thing I anticipated is that the final expression in my example would be very expensive. Based on number of objects, this alone takes 200 seconds out of a total of 300 second run.

I appreciate the time you took to respond, if the common keys are first zipped to a single data structure then index lookups will be possible (and should be much much faster too :slight_smile: )

Iā€™ll give this a go and share my findings. This has been fun so far figuring this out and your original post also nudged me in the right direction.

Just to say that with your help I reduced the run times from 300 seconds do about 60. So far so good, and processing many objects. Will share once its all done. Thank you!