Question about module and resource dependency

Hi everyone!

I have a weird behavior between a module and a resource. I don’t know if it’s a bug or normal, but let’s consider the following code:

module "my_module" {
  source = "./my_module"

  path = local_file.foo.filename
}

resource "local_file" "foo" {
  filename = "foo.json"
  content  = jsonencode({ foo : "bar" })
}

And the module:

variable "path" {
  type = string
}

data "local_file" "this" {
  filename = var.path
}

output "test" {
  value    = data.local_file.this.content
}

What the code does is create a file, pass its path to the module and with a data source, read its content. You’d think a terraform apply would work, but I get the following error:

Terraform planned the following actions, but then encountered a problem:

  # local_file.foo will be created
  + resource "local_file" "foo" {
      + content              = jsonencode(
            {
              + foo = "bar"
            }
        )
      + content_base64sha256 = (known after apply)
      + content_base64sha512 = (known after apply)
      + content_md5          = (known after apply)
      + content_sha1         = (known after apply)
      + content_sha256       = (known after apply)
      + content_sha512       = (known after apply)
      + directory_permission = "0777"
      + file_permission      = "0777"
      + filename             = "foo.json"
      + id                   = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.
╷
│ Error: Read local file data source error
│ 
│   with module.my_module.data.local_file.this,
│   on my_module/main.tf line 3, in data "local_file" "this":
│    3: data "local_file" "this" {
│ 
│ The file at given path cannot be read.
│ 
│ +Original Error: open foo.json: no such file or directory
╵

Basically, Terraform tries to read the file before the resource is created, even if the dependency on the field path is explicit.

If I add a depends_on on the module, no issue; I get the expected behavior:

Terraform will perform the following actions:

  # local_file.foo will be created
  + resource "local_file" "foo" {
      + content              = jsonencode(
            {
              + foo = "bar"
            }
        )
      + content_base64sha256 = (known after apply)
      + content_base64sha512 = (known after apply)
      + content_md5          = (known after apply)
      + content_sha1         = (known after apply)
      + content_sha256       = (known after apply)
      + content_sha512       = (known after apply)
      + directory_permission = "0777"
      + file_permission      = "0777"
      + filename             = "foo.json"
      + id                   = (known after apply)
    }

  # module.my_module.data.local_file.this will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "local_file" "this" {
      + content              = (known after apply)
      + content_base64       = (known after apply)
      + content_base64sha256 = (known after apply)
      + content_base64sha512 = (known after apply)
      + content_md5          = (known after apply)
      + content_sha1         = (known after apply)
      + content_sha256       = (known after apply)
      + content_sha512       = (known after apply)
      + filename             = "foo.json"
      + id                   = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

I know the use case is kinda weird (I just simplified it to share something simple with you), but I feel Terraform should recognize the dependency without the explicit depends_on. What do you think?

Cheers,
Antoine Rouaze

PS: My Terraform and providers versions are:

Terraform v1.6.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/local v2.4.0
+ provider registry.terraform.io/hashicorp/null v3.2.1
+ provider registry.terraform.io/hashicorp/random v3.5.1

Hi @Erouan50,

In your configuration Terraform has no way to determine that resource.local_file.this and data.local_file.this are somehow related. There is no dataflow between these resources, so there won’t be any dependencies unless you add them explicitly with depends_on. Data sources are intended to be read as early as possible so their data can be used during the plan, and the only way to defer that is to add depends_on or ensure part of the data source configuration is unknown.

This however can lead to other issues, because you should not have both a managed resource and a data source representing the same logical resource in your configuration. Rather than passing some metadata about the resource into the module to be looked up with a data source, you should pass the actual resource object itself (or specific attributes) into the module.

Hi @jbardin,

Thank you for your detailed answer!

When you say Terraform has no way to determine that resource.local_file.this and data.local_file.this are related, I would argue the opposite :slight_smile:. The dependency is explicit when the module receives the filename from the resource in the field path:

module "my_module" {
  source = "./my_module"

  path = local_file.foo.filename  # <- The module should depend on the local_file resource?
}

Note that without a module, Terraform will create the resource and then read the data source:

resource "local_file" "foo" {
  filename = "foo.json"
  content  = jsonencode({ foo : "bar" })
}

data "local_file" "this" {
  filename = local_file.foo.filename
}

output "test" {
  value    = data.local_file.this.content
}

Here the plan:

Terraform will perform the following actions:

  # data.local_file.this will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "local_file" "this" {
      + content              = (known after apply)
      + content_base64       = (known after apply)
      + content_base64sha256 = (known after apply)
      + content_base64sha512 = (known after apply)
      + content_md5          = (known after apply)
      + content_sha1         = (known after apply)
      + content_sha256       = (known after apply)
      + content_sha512       = (known after apply)
      + filename             = "foo.json"
      + id                   = (known after apply)
    }

  # local_file.foo will be created
  + resource "local_file" "foo" {
      + content              = jsonencode(
            {
              + foo = "bar"
            }
        )
      + content_base64sha256 = (known after apply)
      + content_base64sha512 = (known after apply)
      + content_md5          = (known after apply)
      + content_sha1         = (known after apply)
      + content_sha256       = (known after apply)
      + content_sha512       = (known after apply)
      + directory_permission = "0777"
      + file_permission      = "0777"
      + filename             = "foo.json"
      + id                   = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

And I didn’t have to add depends_on on the data source. What makes me thing it’s a bug, is the behavior between the two examples (with module and without module) should be the same?

I know the use case is kinda weird in the example, so let me share with you a real one. I might make more sense :slight_smile:. I wrote a module to abstract the creation of GitHub repository. The module accepts a map of team where the key is the team and the value their associated permission. To set the permission on the repository, the resource that do that action needs the team id and to get the id, the module uses a data source. It is when I test the module that I use a resource to create the team and then pass the name to the module. Something like that:

# example/basic/main.tf
module "basic_test" {
  source = "../.."

  repository_name = var.repository_name
  teams = {
    (github_team.this.name) = "admin"
  }

  depends_on = [github_team.this] # IMO, I shouldn't need to specify it as the dependency is explicit in team field.
}

resource "github_team" "this" {
  name        = var.team_name
  description = "${var.team_name} team"
}

I know I won’t have to do that with the new test framework anymore, but I haven’t migrated yet :slight_smile:.

Cheers,
Antoine Rouaze

Yes, I can see how that could be confusing. When we fixed a lot of the problems with data source handling in earlier versions of terraform by including their read directly in the plan, we had to make an exception for data sources that directly reference managed resources for backwards compatibility. This means that your example in the single module falls under “Data Resource Dependencies”, which behaves as if you added depends_on to the data resource block.

The general guidance however is that a single configuration should not have the same resource represented by both a managed resource and a data source, avoiding the inherent problem with order of operations.

I thought I’d drop in another trick here too, just to show all possibilities. It’s a little contrived with this example, but so is the example :wink:

If you have some static data which you don’t want to act on during the plan, you can use terraform_data to effectively make that unknown until apply.

module "my_module" {
  source = "./my_module"
  path = terraform_data.foo_json.output.path
}

resource "terraform_data" "foo_json" {
  input = {
    path = local.foo.filename
    content =  jsonencode({ foo : "bar" })
  }
}

resource "local_file" "foo" {
  filename = "foo.json"
  content  = terraform_data.foo_json.output.content
}

Now anytime there is a change in either the path or content of the json file, terraform_data.foo_json.output will be unknown during plan and any data source using that value won’t be read until apply.

Nice! Good to know! I wasn’t aware of this resource, thank you!

And I do agree with you; I shouldn’t represent a resource with a resource and data source. It’s just the way I wrote my tests. To simplify everything, I create the resources that the module depends on in the same root module. I was just surprised by the difference in behavior, but I get why now :slight_smile:.