What is wrong with this trivial usage of local variable?

I have posted this question on SO - https://stackoverflow.com/questions/61218173/why-terraform-is-unable-to-compute-a-local-variable-correctly-in-the-following-t

Here it is:

Given is the following configuration (main.tf):

locals {
    locations = toset(["a", "b"])
}

resource "local_file" "instance" {
    for_each = local.locations

    content  = each.value
    filename = "${path.module}/${each.value}.txt"
}

output "primary_filename" {
    value = local_file.instance["a"].filename
}

And it seems to work fine:

C:\work\test> dir


    Directory: C:\work\test


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----        4/15/2020  11:47 PM            280 main.tf


C:\work\test> terraform init

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "local" (hashicorp/local) 1.4.0...

...
C:\work\test> terraform apply -auto-approve
local_file.instance["b"]: Creating...
local_file.instance["a"]: Creating...
local_file.instance["a"]: Creation complete after 0s [id=86f7e437faa5a7fce15d1ddcb9eaeaea377667b8]
local_file.instance["b"]: Creation complete after 0s [id=e9d71f5ee7c92d6dc9e92ffdad17b8bd49418f98]

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Outputs:

primary_filename = ./a.txt
C:\work\test>

Now I delete the file a.txt and rerun:

C:\work\test> del .\a.txt
C:\work\test> terraform apply -auto-approve
local_file.instance["a"]: Refreshing state... [id=86f7e437faa5a7fce15d1ddcb9eaeaea377667b8]
local_file.instance["b"]: Refreshing state... [id=e9d71f5ee7c92d6dc9e92ffdad17b8bd49418f98]

Error: Invalid index

  on main.tf line 13, in output "primary_filename":
  13:     value = local_file.instance["a"].filename
    |----------------
    | local_file.instance is object with 1 attribute "b"

The given key does not identify an element in this collection value.

It can be fixed by using the try function:

    value = try(local_file.instance["a"].filename, "")

Which does make it work:

C:\work\test> terraform apply -auto-approve
local_file.instance["b"]: Refreshing state... [id=e9d71f5ee7c92d6dc9e92ffdad17b8bd49418f98]
local_file.instance["a"]: Refreshing state... [id=86f7e437faa5a7fce15d1ddcb9eaeaea377667b8]
local_file.instance["a"]: Creating...
local_file.instance["a"]: Creation complete after 0s [id=86f7e437faa5a7fce15d1ddcb9eaeaea377667b8]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

primary_filename = ./a.txt
C:\work\test>

Now I know we are not supposed to delete resources outside of terraform, but things happen and my expectation is that terraform handles it gracefully. And it does, except for this local variable behavior.

I do not like using the try function, because it would hide a real problem. Ideally, it should behave like try during the plan phase and without try during the apply phase.

Anyway, I have a feeling I am missing something important here, like I am not using the local variables correctly or something else. So, what am I missing?

Hi @MarkKharitonov,

I think the problem here is that Terraform is refreshing local_file.instance["a"] (as we can see in the terraform apply output) and finding that it no longer exists, and then it’s trying to update all of the outputs to match (which it does in case an output refers to something that changed during refresh) it is failing because that instance is no longer present. This seems to be an edge case with Terraform trying to make the refresh result consistent; I don’t think it has anything to do with the local values in particular.

If the cause is what I think it is, I think you could make it work by using terraform state rm to tell Terraform explicitly that this object is gone and that it should not try to refresh it:

terraform state rm local_file.instance["a"]

I can do that or I can use the try function. Both are bad. The first one is bad, because it interrupts the flow of the automated deployment. The second - because it hides potentially real issues.

From your description I have difficulty figuring whether this is a bug in terraform or a legit behaviour.

Terraform seems to be overly sensitive if resources are deleted/modified outside of it (I know this should not happen, but sometimes it does). I was under the impression that it would auto adjust the resources to the desired state. But this does not happen in reality, at least not with the Azure provider.

I reported this as a bug - https://github.com/hashicorp/terraform/issues/24883

Hi @MarkKharitonov,

Sorry I didn’t see your previous reply before; I was out sick when you replied.

This seems like a set of unusual situations combining to produce an annoying result: as far as I can tell, all of the individual pieces of this are working correctly in isolation but this particular combination doesn’t work as expected.

I don’t know that I would necessarily classify it as a bug: by modifying an object outside of Terraform you’ve entered a state where Terraform does make a best-effort to handle it (by refreshing the objects) but will never be able to that 100% effectively because it can’t infer the intent of those actions taken outside of Terraform or their full impact.

My main answer to this is that in normal Terraform usage the objects Terraform is managing should not be altered by anything outside of Terraform. While Terraform may be able to recover sometimes, that is always a best-effort thing and will not always succeed. Therefore odd behavior related to drift detection is something we do try to fix when possible, but isn’t something we can officially support or guarantee will always work.

I realize that TF managed infrastructure should not be touched outside of terraform. But in reality this happens. For instance, we modified a role used by the deployment Service Principal and wish to test whether it is able to configure the SSL bindings in an App Service. How would we do it?

The simplest approach - delete the SSL bindings in the deployed App Service and redeploy. Bang - it blows away. Why? Because we have a map of SSL bindings in the TF configuration with only one having been deleted in the real world. But because of the way TF resolves the maps - it fails.

It means in order to test what we want to test not only should we delete the SSL binding in the real world, but we also need to manipulate the TFS state. Which is cumbersome. It is also unintuitive. TF refreshes the state, so it knows the entry disappeared. Yet it fails.

It feels very much like a bug.

I think debating whether or not we call it a bug is not really the point here. The main issue at hand is that an issue that arises only when you use Terraform outside of its intended usage will tend to always have lower priority to be fixed than one that arises in normal use.

Commands like terraform state rm exist in recognition of the fact that sometimes you need to step outside of standard Terraform usage and do something unusual. If you delete things outside of Terraform then you will sometimes need to inform Terraform you did that using terraform state rm. Future changes to Terraform may reduce the number of situations where that is needed, but it’s unlikely that such changes will be prioritized over issues in the main workflow.

With that said, I do appreciate you reporting the issue (I didn’t see that comment the first time I looked). The team will take a look at that when they’re able, and if there is a straightforward fix then it may be fixed sooner rather than later.