Dependency model, `depends_on` vs data reference

Hello,

Hope this is the right channel, I’m looking into the behaviors of “dependencies” in Terraform and had some questioning…

Consider two resources A and B

resource "docker_image" "B" {
  name = "postgres:15-alpine"
}
resource "docker_container" "A" {
  name  = "db"
  image = docker_image.db.name
  depends_on = [ docker_image.B ]
}

So here, Terraform doesn’t differentiate between the two expressions of A -> B dependency:

  • as a data reference to an attribute’s value with docker_image.db.name
  • as a depends_on

i.e. if we use only one of the two method, the order of creation/update/deletion would be the same.

My first question: is that correct ?

My second question: What was the reason to treat identically both references and depends_on ?

The reason for asking is that maybe TF could have more parallelism if, at the planning phase, references to Known values didn’t introduce lifecycle dependencies. Here the image and container would be deployed in parallel if we only had docker_image.db.name.
Moreover, references to Unknow values would still play the role of “create that before this” as an intuitive dependency. Here it would mean use image_id read only attribute instead of name.

3rd question answered after actual testing…

Another question on a variation of the program: remove the depends_on and apply, then remove everything and apply again. How does Terraform know it must destroy A before B ? Does it ? Because it seems that data references are not kept in the state.

Answer: they do are kept under dependencies

Assuming you meant docker_image.db to be docker_image.B in your example, yes those setup the same dependencies in the configuration. The depends_on meta attribute is there to allow you to add references to ensure a dependency exists when there is no natural attribute which would otherwise create that reference.

There is no distinction between them for managed resources, because that is the design of the language. Terraform must determine the dependency order for evaluation before it begins evaluation, so all references are technically “unknown” at that point. It would also be far more difficult for users to anticipate what will happen, and to debug what may have gone wrong if references worked differently at different times in different contexts.

In practice there’s no real reason to reference a statically known attribute of another resource, except to establish this exact dependency order. If that ordering is not desired, put the name string literal in a local value and reference it separately from both resources to avoid the dependency.

If the ordering is required sometimes, e.g. on the first apply with a previously unknown value, then that ordering must be maintained perpetually. Terraform cannot make any assumptions about the semantics and interdependencies of the real remote resources.

Data resources are different however, because they have a different lifecycle which doesn’t coordinate with managed resources. Because the datasource is intended to be read during plan, even with the correct dependency ordering upstream managed resources may not be ready, so depends_on indicates that any change in something referenced means that reading the data source must be deferred until after those changes have been applied. Data resource dependencies are a little cumbersome due to some historical context and accidents which need to be carried forward for compatibility.

1 Like