Terraform wants to re-read all instances of an iterated data resource, even if only one key gets added

camilo-s · April 15, 2024, 3:55pm

I understand that Terraform needs to re-read data sources whenever they depend on a resource or a module with changes pending.

But is there a way to let Terraform know whether there are changes pending at a granular key-value level when data sources are iterated with a for_each argument?

Here’s a rough description of my use case:

I need to read few Microsoft Entra ID groups in my configuration to use them in few permission assignments.
For this, I use a module that reads the groups with an azuread_group data resource that I iterate over a map of the following form, where each key indicates the groups name, and the value is an object indicating the various permissions that should be assigned to it:
```
groups = {
  group_name_1 = {
    foo = "bar"
    ...
  }
  ...
}
```
The module’s output is essentially the groups map enriched with each group’s read properties (e.g. its Azure ID) and is used downstream for the actual permission assignments.
To assign permissions in one downstream module, I need to read the groups again with a different provider (with databricks_group FWIW), referencing few of the enriched attributes. This is iterated with a for_each in an entirely analogous way as in 2. above.

My problem: whenever I add an additional key-value pair to my groups map, Terraform wants to re-read all iterated data sources in 4., not just the ones indexing the new key-value pairs I’ve added. This has the drawback that few resources depending on read attributes of groups need to be redeployed, even when they are completely independent from the new key-value pair.

In a sense, the for_each meta argument couples all iterated instances, although they should be independent of each other.

My questions:

Is this perhaps due to the fact that the first data read at 2. encapsulates the module’s output (basically same input but enriched with read group attributes) in a monolithic fashion, in the sense that Terraform doesn’t know about the fine-grained iterated nature of this output when it’s iterated over in 3., and therefore interprets the change in the module output as a potentially full change, as opposed to just the addition of a key-value pair?
If so, would it be possible at some point in the future for Terraform to propagate the map nature of the output an peek into which key-value pairs will be subject to a modifications and which won’t and shouldn’t trigger a modification downstream?
Is this maybe an issue of the Databricks provider exclusively? I say this because only those data resources are re-read, whereas other references to the module’s output in 2. using the azurerm provider never induce a full re-read of all iterated data sources.

jbardin · April 15, 2024, 5:08pm

Hi @camilo-s,

When planning, Terraform sets configuration dependencies at the resource-level rather than per instance, so using something like depends_on to force a data source to be deferred until apply can only operate at the level of the resource as a whole.

Another reason for a data source to be deferred is if a portion of its configuration is unknown. It may be possible to restructure the configuration to force the individual instance data to be unknown without the use of depends_on. I’d have to see the configuration to know if that’s possible, but the builtin terraform_data resource has the triggers_replace, input and output attributes to assist with difficult cases like this.

camilo-s · April 16, 2024, 7:56am

Hi @jbardin,

thanks for the thoughtful response.

I do have a depends_on in my configuration which is likely the root of my issue, but unfortunately I’m not able to remove it at the moment (lest I split the Terraform state file, which I’d like to avoid). So the workaround I currently have to opt for is to add some ignore_changes in downstream resources to avoid redeployment (manually removing them when changes should occur)

I have a particular case where certain resource (a databricks_group to be precise) in a module only needs to be created in one environment, but read in the other environments. To have uniform references to this resource, I funnel them through a data resource (this requires the depends_on). I’d have to think harder, but it seems no amount of duck-typing would let me overcome this without splitting the state file, because that would only amount to moving the explicit depends_on further upstream.

I’ll try to distill an MWE from my configuration to figure if I’m maybe overlooking something.

apparentlymart · April 16, 2024, 3:50pm

Hi @camilo-s,

What you’ve described seems similar to the situation described in the Module Composition guide’s section Conditional Creation of Objects.

The approach used in that example is to write a module that just always expects the needed object to exist and makes it the module caller’s responsibility to decide whether to provide that information from a managed resource (a resource block), a data resource (a data block), or some other strategy entirely.

That approach then avoids the troublesome pattern of having a configuration treat the same object both as a managed object and an external dependency at the same time, and thus avoids making Terraform re-read data that it already has in memory elsewhere anyway, and makes the dependency graph more precise.