Locals merge based on specific pattern

jollyranger · February 23, 2023, 2:53pm

Lets assume we have two locals:

    "azure_projects": {
      "value": [
        {
          "original": "MY-SUB-EX-P-SEQ00480-PRD-PUB-LCFS-DEF",
          "project_name": "AS-EX-P-SEQ00480-PRD-PUB-LCFS-DEF"
        },
        {
          "original": "MY-SUB-EX-P-SEQ00482-PRD-PUB-PAMP-TA1",
          "project_name": "AS-EX-P-SEQ00482-PRD-PUB-PAMP-TA1"
        },
        {
          "original": "MY-SUB-EX-P-SEQ00484-SEE-DP-Fundamentals",
          "project_name": "AS-EX-P-SEQ00484-SEE-DP-Fundamentals"
        }
      ]
    }

    "azure_contributors": {
      "value": [
        "MY-GRP-EX-P-SEQ00480-Contributor",
        "MY-GRP-EX-P-SEQ00481-Contributor",
        "MY-GRP-EX-P-SEQ00482-Contributor",
        "MY-GRP-EX-P-SEQ00483-Contributor",
        "MY-GRP-EX-P-SEQ00484-Contributor"
      ]
    }

My goal is to marry up these tuples where the sequence number extracted (i.e. (SEQ\\d{5,}) for azure_contributors matches the sequence number of the project_name in azure_projects. The conditional logic in Terraform looks a bit anaemic where if type of clauses are boolean return types only, challenge is once you use if in a for expression, then the result cannot be a tuple but instead a map, which renders further for expressions not possible to my knowledge. Further, substr nor split won’t work as thee strings outside of this example vary in length and delimiter (-) count.

To desired result to look something like below:

 "new_local": {
      "value": [
        {
          "contributor": "MY-GRP-EX-P-SEQ00480-Contributor",
          "project_name": "AS-EX-P-SEQ00480-PRD-PUB-LCFS-DEF"
        },
        {
          "contributor": "MY-GRP-EX-P-SEQ00482-Contributor",
          "project_name": "AS-EX-P-SEQ00482-PRD-PUB-PAMP-TA1"
        },
        {
          "contributor": ""MY-GRP-EX-P-SEQ00484-Contributor"",
          "project_name": "AS-EX-P-SEQ00484-SEE-DP-Fundamentals"
        }
      ]
    }

I’ve tried a bit already, maybe something like merge can help, going to test that out and report back

jollyranger · February 23, 2023, 5:32pm

Getting closer

  azure_projects_extended = merge([for ap in local.azure_projects : {
    for c in local.azure_contributors : c => {
      project_tocreate = ap.project_name
    }

    if length(regexall(".* SEQ00484.*", ap.project_name)) > 0 && length(regexall(".*SEQ00484.*", c)) > 0
    }
    ]
  ...)

produces:

   "azure_projects_extended": {
      "value": {
        "MY-GRP-EX-P-SEQ00484-Contributor": {
          "project_tocreate": "AS-EX-P-SEQ00484-SEE-DP-Fundamentals"
        }
      },

was helpful to know that I could do a bit more with a if statement on the outer block than previously thought, last thing is getting the sequence extracted instead of the hard-coded test of .*SEQ00480.*, hopefully i can do that ‘in-line’…

jollyranger · February 23, 2023, 5:54pm

Ok nice looks like below works

  azure_projects_extended = merge([for ap in local.azure_projects : {
    for c in local.azure_contributors : c => {
      project_tocreate = ap.project_name
    }
    if try(regex("(SEQ\\d{2,})", c), [c])[0] == try(regex("(SEQ\\d{2,})", ap.project_name), [ap])[0]
    }
    ]
  ...)
}

apparentlymart · February 24, 2023, 10:53pm

Hi @jollyranger! It sounds like you already found a working solution so my reply here is just to share another possible way to do it. I don’t think either of these is necessarily better than the other but this way is just the way I would approach it.

I tend to like to approach problems like this by breaking them down into smaller steps that are easier to express as a hopefully-intelligible smaller expression. In this case I see three potential smaller steps:

Project the “Azure projects” so that they are in a map data structure with the sequence numbers as keys.
Project the “Azure contributors” so that they are in a map data structure with the sequence numbers as keys.
Zip the two maps together by those common keys to produce a single data structure.

I’m assuming from your example that for the “Azure projects” either the “original” and “project name” will always have the same sequence number or the “project name”'s sequence number is the important one; I’m going to implement with that assumption in mind but hopefully you can see how to adapt this if that isn’t a correct assumption.

locals {
  projects_by_seq = tomap({
    for proj in local.azure_projects :
    regex("SEQ\\d{5,}", proj.project_name) => proj
  })
  contributors_by_seq = tomap({
    for name in local.azure_contributors :
    regex("SEQ\\d{5,}", name) => {
      name = name
    }
  })
}

The above completes the first two steps, giving data structures like this:

projects_by_seq = tomap({
  "SEQ00480" = {
    original     = "MY-SUB-EX-P-SEQ00480-PRD-PUB-LCFS-DEF"
    project_name = "AS-EX-P-SEQ00480-PRD-PUB-LCFS-DEF"
  }
  "SEQ00482" = {
    original     = "MY-SUB-EX-P-SEQ00482-PRD-PUB-PAMP-TA1"
    project_name = "AS-EX-P-SEQ00482-PRD-PUB-PAMP-TA1"
  }
  "SEQ00484" = {
    original     = "MY-SUB-EX-P-SEQ00484-SEE-DP-Fundamentals"
    project_name = "AS-EX-P-SEQ00484-SEE-DP-Fundamentals"
  }
})
contributors_by_seq = tomap({
  "SEQ00480" = {
    name = "MY-GRP-EX-P-SEQ00480-Contributor"
  }
  "SEQ00481" = {
    name = "MY-GRP-EX-P-SEQ00481-Contributor"
  }
  "SEQ00482" = {
    name = "MY-GRP-EX-P-SEQ00482-Contributor"
  }
  "SEQ00483" = {
    name = "MY-GRP-EX-P-SEQ00483-Contributor"
  }
  "SEQ00484" = {
    name = "MY-GRP-EX-P-SEQ00484-Contributor"
  }
})

Another assumption I’ve made from your example is that we should ignore any contributors whose sequence key does not appear in any project, and that if a project has no contributor then we should set the contributor name to null. With those assumptions in mind, here’s step 3:

locals {
  project_contributors = toset([
    for k, proj in local.projects_by_seq : {
      project_name = proj.project_name
      contributor  = try(local.contributors_by_seq[k].name, null)
    }
  ])
}

I expect that this would produce a data structure like the one you showed in your example:

project_contributors = toset([
  {
    contributor  = "MY-GRP-EX-P-SEQ00480-Contributor"
    project_name = "AS-EX-P-SEQ00480-PRD-PUB-LCFS-DEF"
  },
  {
    contributor  = "MY-GRP-EX-P-SEQ00482-Contributor"
    project_name = "AS-EX-P-SEQ00482-PRD-PUB-PAMP-TA1"
  },
  {
    contributor  = "MY-GRP-EX-P-SEQ00484-Contributor"
    project_name = "AS-EX-P-SEQ00484-SEE-DP-Fundamentals"
  },
])

A lot of solutions in Terraform come down to choosing the most appropriate data structure for the work you want to do, projecting the data into that structure, and then using the intermediate data structure to get the final result. I chose to use maps for the intermediate data structures here because your requirement was to group things together by strings and that seems like a “map-type problem”.

Continuing the theme of selecting the most appropriate data type, I also made the final data structure be a set of objects rather than a list as you illustrated, because this process of first grouping by sequence key and then zipping together has effectively lost the original order of projects, and a set data type communicates that these items are not in any particular order, whereas a list implies that the order is meaningful in some way. (If you did use a list here then they’d be ordered by the map keys, meaning that they’d be ordered by the sequence key. If that’s a suitable order then you could use tolist instead of toset to get that result.)

jollyranger · February 25, 2023, 3:31am

Thanks! I think your approach is very sensible. One thing I anticipated is that the final expression in my example would be very expensive. Based on number of objects, this alone takes 200 seconds out of a total of 300 second run.

I appreciate the time you took to respond, if the common keys are first zipped to a single data structure then index lookups will be possible (and should be much much faster too )

I’ll give this a go and share my findings. This has been fun so far figuring this out and your original post also nudged me in the right direction.

jollyranger · March 1, 2023, 7:44am

Just to say that with your help I reduced the run times from 300 seconds do about 60. So far so good, and processing many objects. Will share once its all done. Thank you!

Topic		Replies	Views
The true and false result expressions must have consistent types. The given expressions are object and tuple, respectively Terraform	2	17415	November 28, 2019
Nested For Loops Terraform	3	269	August 7, 2024
Mismatch in condition in for_each: The true and false result expressions must have consistent types Terraform	1	13	November 18, 2024
Help with transforming two vars with local (tf 0.12.8) Terraform	1	298	September 18, 2019
How can I fix inconsistent conditional result types when returning values from the same map? Terraform	2	2391	December 6, 2021

Locals merge based on specific pattern

Related topics