Resource destroy design flaw?

When making edits to .tf files, I encounter problems when the latest .tf files don’t align well with the applied environment. I have seen errors such as detection of circular reference when there really isn’t one in the latest .tf files, or fail to find a provider to destroy a resource that was removed from the latest .tf script. Usually this requires stashing changes or doing manual cleanup and hand-editing of .tfstate.

Would this work better? Suppose terraform used the latest .tf files to identify what needs to be done, then made a removal pass before updating/adding; and when removing, use a backup copy of the previously applied .tf files instead of the latest .tf files.

Hi @jimsnab,

Without seeing some more specific information I can only really guess what’s going on here, but my best guess is that because you have resources that are no longer present in the configuration Terraform is instead using information from the state to recover the original dependency information. I can imagine some trickier cases where the configuration has changed a lot where the partial information in the configuration combines with the partial information in the state to produce an invalid dependency graph.

The Terraform team at HashiCorp is currently working on a forthcoming new feature for documenting historical moves/renames of resources and modules, one motivation for which is to help Terraform understand the historical context when making a plan. This is similar in principle to your idea of referring to historical .tf files, but with the information recorded in the current .tf files instead (because Terraform typically doesn’t have access to a historical configuration.)

# NOT YET RELEASED
moved {
  from = aws_instance.a
  to   = aws_instance.b
}

When making a plan for a configuration containing a statement like the above, Terraform would first check to see if there’s an aws_instance.a recorded in the state, and if so pretend that it had originally been created as aws_instance.b instead. The main motivation for this is to avoid replacing the object when moving it, but it can also help with the situation you encountered because Terraform wiill be able to see that it should use the current configuration of aws_instance.b in order to understand the dependencies for what was formerly recorded as aws_instance.a.

Although out of scope for the initial work we’re currently doing, we’re also considering some similar annotations for other sorts of changes that Terraform might need to take into account when planning. One candidate is a removed block which records that a resource used to exist and gives a place to capture some metadata about it that can help Terraform understand how that object was originally declared:

# NOT YET IMPLEMENTED: Final design might look different
removed {
  from = aws_instance.a

  depends_on = [aws_instance.c]
}

For the initial release Terraform will require writing moved blocks manually in order to get the benefit from them, but we’re also considering later additions of new commands to help coordinate changes to the current module:

  • terraform move aws_instance.a aws_instance.b might potentially relabel the resource "aws_instance" "a" as resource "aws_instance" "b" and generate the moved block shown above in a single action.
  • terraform remove aws_instance.a might potentially remove the resource "aws_instance" "a" block and generate a removed block including a depends_on that covers the same dependencies that the resource configuration previously implied, along with any other information that can help Terraform plan to destroy that object.

(Again, these are just hypothetical examples to show what I mean, and the final design might be quite different.)

Do you think that the functionality I’ve described above might’ve helped with the problem you encountered?

It would also be interesting to think about, as you suggested, having a way to provide a historical version of the configuration so that Terraform can potentially infer information like what might appear in a removed block, though that’d be a very significant change to Terraform’s model which I’m sure will have various implications to consider while designing it. Therefore I think implementing something like this removed block I showed above, potentially accompanied by a command to automate generating it, is the more practical solution in the medium term, and it seems likely that the functionality it implies would end up being part of any automatic solution based on analyzing an old configuration anyway.

Hello - thanks for the reply and great detail!

This sounds like what GIT has to deal with, and I wonder about how it’s approach relates to terraform’s needs. Could terraform devise a canonical identity and content hash for the resource, and maintain it in a history file? Maybe this could facilitate automatic move or remove detection, or array position reordering detection.

One of the ways I came across problems was refactoring my layout. The number of directories got too high and I wanted to clean it up. And then the clean up process led to renames as the layout refactoring made less sense with the original names. I’m not thrilled with having to manually declare move or remove, though I could see real cases where that helps. Hopefully this would not require all of the old folder structure in order for terraform to understand the original names?

One other use case came to mind. I had this in a project:

# pseudo
module cluster1 { ... }
module cluster2 { ... }

Each invokes whole tree of .tf modules that set up a cluster. I wanted to comment out one cluster, run apply and have it destroy half of the resources. Terraform couldn’t do it, because it required the providers declared in the commented out path. I ended up having to comment out within the details with multiple applies before I could comment out the top level.

@jimsnab,

Unfortunately it’s the not the identity of the resource in the state which terraform is not able to determine, the problem is almost always caused by declaring dependencies in one direction which is stored in the state, then altering the configuration in such a way as to reverse those dependencies. In most cases this is actually an error in the configuration, as the logical flow of operations should stay in the same order for each tree of resources. Occasionally when a user does legitimately want to re-order resource in this way, it requires applying in multiple steps to first remove the dependencies to avoid the contradiction between the configuration and the state.

Tracking the resource through historical configuration changes as noted above is one possible solution, and the method we are experimenting with is annotating the configuration to do so.

The last error you are referring to is a different situation. In order to destroy resources, they must have the provider which created them. If you have the provider configured in the module source, and remove the module, terraform has no way to know how to configure the provider for those resources. This is the primary reason we do not recommend putting provider configuration within modules at all, and providers should be passed down explicitly from the root module. You can see here for more details on provider configuration within modules: Providers Within Modules - Configuration Language - Terraform by HashiCorp

Hi jbardin,

Thanks for jumping in. It’s a great thing to know something is coming that will make this better.

On the “comment out” thread - it’s not always possible to put providers at the root. Example:

provider "helm" {
  kubernetes {
    host                   = module.main_cluster.host
    cluster_ca_certificate = module.main_cluster.ca_certificate
    client_certificate     = module.main_cluster.client_certificate
    client_key             = module.main_cluster.client_key
  }
}

In the above provider, there is a dependency on an upsteam .tf that creates the cluster and generates the cluster cert.

And philosophically, I think the the provider should be coupled to its usage. Why should the root have to know what an inner module needs?

I will admit on the refactoring scenario, this is a development use case where a lot is getting changed and not always right. Still I think the point is strong - terraform destroy steps of version 2 ought to be based on the version 1 scripts that match current state of the environment.

Ignoring the philosophy there shouldn’t be any reason you can’t put the provider in the root, you just need to ensure things are being passed around as needed. We have helm/kubernetes providers similar to your example within modules and we just need to ensure that the right outputs are set in the module to allow it to all work.

Maybe I just don’t know how it is done. When there are two clusters, how does the root declare a “helm” provider for each, and how does the module know which one to use?

You want provider aliases: Provider Configuration - Configuration Language - Terraform by HashiCorp

Ah I forgot about aliases. I decided against them awhile ago, because they don’t work for the reusability I am after.

In root, this alone does not work:

provider "helm" {
  alias = "main"
  kubernetes {
    host                   = module.main_cluster.host
    cluster_ca_certificate = module.main_cluster.ca_certificate
    client_certificate     = module.main_cluster.client_certificate
    client_key             = module.main_cluster.client_key
  }
}

# psuedo
module "example_main_cluster" {
  source = "../../clusters/main"
  providers = {
    helm = helm.main
  }
}
│ Error: Reference to undeclared module
│ 
│   on main.tf line 6, in provider "kubernetes":
│    6:   host                   = module.main_cluster.host
│ 
│ No module call named "main_cluster" is declared in the root module.

Even though the module reference eventually declares module.main_cluster, terraform errors before it gets there.

Behind the door of creating main_cluster is a tree of dozens resources, some have complex dependencies. To move the invocation of the main_cluster into the root will then require other things to move into the root.

Alternatively, relying only on default and the module scope allows me to reuse the cluster logic very cleanly.

After moving providers to the root, it’s no longer a clean, reusable module structure, and logic must be duplicated.

So I decided against alias, and I am able to have two separate environments that are constructed exactly the same with minor differences such as GKE project ID. But that exposes the problem of terraform trying to use version 2 tf files against a version 1 environment.

I also tried specifying as, for example, host = module.example_main_cluster.main_cluster.host

╷
│ Error: Unsupported attribute
│ 
│   on main.tf line 10, in provider "helm":
│   10:     host                   = module.example_main_cluster.main_cluster.host
│     ├────────────────
│     │ module.example_main_cluster is a object, known only after apply
│ 
│ This object does not have an attribute named "main_cluster".

When referencing things from modules you need to expose them via outputs. What is your output code within the module like?

Thanks for the guidance! Lack of declaring main_cluster as an output was indeed the issue with declaring a provider at the root. The ‘known only after apply’ error was deceptive.

I still won’t do that at the top level because it duplicates providers into every environment .tf, but I can create a global .tf that they all include. Will it hurt to declare providers that don’t get used in a particular environment? I guess not.

This at least is a better workaround for the version 2 not using version 1 .tf files. I would remove the resources but keep the retired provider(s), then remove the unused provider(s) as step 2.

Thanks!

(but please keep thinking about having terraform use version 1 .tf files when environment has version 1 deployed)