Regex - need some help

Hi,
Long story short, I have output from terraform console like this

> toset([for key, value in var.xxx : join(", ", value.users)])
toset([
  "aaaa, bbbb, cccc",
  "dddd, eeee, ffff",
  "gggg, hhhh",
])

and I want to convert using regex to this state

toset([
  "aaaa",
  "bbbb",
  "cccc",
  "dddd",
  "eeee",
  "ffff",
])

Can someone help me to create regex rule to achieve desired state?

> toset([ for key, value in var.xxx : regex("[a-z]+", format("%s", join(", ", value.users))) ])

Thank in advance.

Hi @wu_j,

With the requirement you stated, my first instinct would be to use split instead of regex, like this:

toset(flatten([for s in var.xxx : split(", ", s)]))

However, if your goal is to just extract any contiguous sequence of ASCII letters, ignoring which exact separator characters are splitting them, then indeed a regular expression is probably the best way to achieve that. For situations like this where you want to get potentially many matches per string, regexall is a better fit than regex:

toset(flatten([for s in var.xxx : regexall("[a-z]+", s)]))

For the example input you shared both of these approaches should be equivalent. The regexall approach will only be different if your input includes any separators other than ", ", since the regex matches the letter sequences themselves rather than the separators between them.

In both of these cases I’ve used flatten because otherwise the for expression would cause the result to be a list of lists rather than just a single list. The flatten function concisely expands any nested lists into the top-level list, so that the result is a list of all of the non-list values (strings, in this case) that were reachable by walking through the nested lists.

Hi @apparentlymart

You’re absolutely right, it works perfectly exactly as you wrote.

> toset(flatten([for key, value in var.xxx : regexall("[a-z]+", format("%s", join(", ", value.users)))]))

Many thanks for your help - appreciate
cheers :slight_smile: