This is absolutely true. Nevertheless, I’ve had to disobey this intent, and directly interface with the state JSON to get things done in some circumstances, so I thought it might be interesting to talk about those circumstances:
1) Splitting state files
If you are at a company, which has ended up having one Terraform workspace to provision, for example, all the Git repositories in the company, there comes a time where the number of resources is challenging Terraform’s ability to scale. (It does scales a lot worse than linearly with increasing number of resources - measured performance degrading proportional to the cube of the resource count in some tests.)
So, what do you do? Well, the expedient solution is to cut your workspace up into multiple shards. But how do you deal with the existing resources?
Using terraform import
is out of the question. Firstly, it’s way too slow, being able to process only a single resource at once.
Secondly, if your resources are defined via moderately complex modules, so each “user-level” resource is actually a variable number of Terraform resources internally, calculating all the necessary imports to run would be really really fragile.
Thirdly, providers are… mixed… when it comes to implementing import correctly, or at all.
Fortunately, there is a relatively easy way to deal with this: just copy the entire state file multiple times, and use terraform state rm
to remove everything you don’t want in each instance. The terraform state rm
command is capable of accepting multiple resource addresses in one batch, and recursively removing all resources in a module, so this is quite easy.
But, one last thing: as @apparentlymart explained, “lineage” is an important protection against accidental screwups in the future. To benefit from this protection, it was necessary to manually reset the lineage to a freshly generated UUID in each split state, via direct JSON manipulation.
2) Recovering from a bug in terraform-provider-vault
Vault has something called a KV secrets engine. It has multiple versions. The Vault API allows two alternative expressions of requesting a version 2 KV secrets engine:
type="kv" options={"version": "2"}
or
type="kv-v2"
Vault itself will convert the second form to the canonical first form. terraform-provider-vault
knows about this, and implements a special workaround for comparing the second form (in Terraform state) to the first form (retrieved from the Vault API during refresh) as equal.
However, terraform-provider-vault
neglects to handle the canonicalisation correctly in the import operation.
So there I was, needing to import one of these resources defined in the second (non-canonical form), and stuck with a buggy import.
I wanted to just update my configuration to use the canonical form everywhere… but my configuration was in a module that was used hundreds of times already, managing existing resources, and Terraform would plan to destroy and recreate them if I just did that.
To solve this, it was necessary to build a custom script that would rewrite the attributes of existing resources in the state file to the canonical form via raw JSON processing.
3) Handling a transition between providers implementing the same resources
GitHub has two different APIs - REST and GraphQL. Some functionality is only available in the GraphQL API.
The main terraform-provider-github
was taking a while to implement some of that. The community responded with terraform-provider-github-v4
which supplemented the main provider with additional resources. We used it.
Then eventually those resources got replicated in the main provider… but not with exactly the same resource schema!
Oh dear… without doing something fairly unusual, we were now stuck using the extra, now deprecated, provider forever.
I created a script to use direct JSON manipulation to rewrite the provider addresses and perform algorithmic transformations on resource attributes, so we could migrate back to the main provider.
In conclusion
The state file format is a bit like what’s under the bonnet of a car… with the right knowledge, you can do some very useful things, but it shouldn’t be fiddled with casually, nor should you assume that your knowledge is still valid, after an upgrade!