Terraform state schema

Greetings,

I was wondering if there’s a schema definition for a terraform.tfstate file anywhere? I was especially wondering how things like lineage, serial, and version are obtained / calculated. In general, I’m curious how to tell if a state file is correct from a syntax / data types point of view.

Thank you!

Hi @sc250024,

The state snapshot format is an implementation detail of Terraform and not a public interface intended for external integration. Because Terraform needs to be about to round-trip all of the information it produces from one run to the next, the exact details of these snapshots tend to change slightly between Terraform releases as new features require tracking new data or tracking existing data in a different way.

There is a different JSON format which exposes a subset of the state data in a manner that is suitable for public consumption and subject to the v1.x compatibility promises. You can find the documentation about that format in JSON Output Format; you can run terraform show -json to produce the structure described under “State Representation”.


I can still say a little about the specific details you were wondering about, in case this helps satisfies your curiosity:

Terraform uses “version” to recognize when the current state snapshot is in an earlier version of the format and therefore might need upgrading, or in a later version of the format and would therefore not be safe to read by an earlier version of Terraform (because it might contain information the older version doesn’t understand).

Terraform uses “lineage” and “serial” together as a way to catch situations where two parties are trying to make changes with Terraform in an order that could potentially cause problems

For example:

  • Party A: terraform plan -out=tfplan
  • Party B: terraform plan -out=tfplan
  • Party A: terraform apply tfplan
  • Party B: terraform apply tfplan

When Party B runs the final command, Terraform will notice that the serial in the latest state snapshot is different than the one that was used to create tfplan and so will refuse to apply that stale plan.

“Lineage” deals with a different situation of trying to apply a saved plan to the wrong state altogether. Whenever Terraform creates the first state snapshot for a new configuration or workspace it will generate a new random lineage and record it in that first snapshot. Any subsequent snapshots in the same configuration/workspace will preserve the same lineage and increment snapshot.

Terraform uses these two fields in conjunction with the remote state backend locking mechanisms, so make sure that a correctly-configured Terraform can’t race against another one running on another system to try to update serial at the same time, or to write a snapshot intended for some other configuration/workspace.

What I’ve described here is an implementation detail of current versions of Terraform and not guaranteed to remain true in future versions of Terraform.

1 Like

This is absolutely true. Nevertheless, I’ve had to disobey this intent, and directly interface with the state JSON to get things done in some circumstances, so I thought it might be interesting to talk about those circumstances:

1) Splitting state files

If you are at a company, which has ended up having one Terraform workspace to provision, for example, all the Git repositories in the company, there comes a time where the number of resources is challenging Terraform’s ability to scale. (It does scales a lot worse than linearly with increasing number of resources - measured performance degrading proportional to the cube of the resource count in some tests.)

So, what do you do? Well, the expedient solution is to cut your workspace up into multiple shards. But how do you deal with the existing resources?

Using terraform import is out of the question. Firstly, it’s way too slow, being able to process only a single resource at once.

Secondly, if your resources are defined via moderately complex modules, so each “user-level” resource is actually a variable number of Terraform resources internally, calculating all the necessary imports to run would be really really fragile.

Thirdly, providers are… mixed… when it comes to implementing import correctly, or at all.

Fortunately, there is a relatively easy way to deal with this: just copy the entire state file multiple times, and use terraform state rm to remove everything you don’t want in each instance. The terraform state rm command is capable of accepting multiple resource addresses in one batch, and recursively removing all resources in a module, so this is quite easy.

But, one last thing: as @apparentlymart explained, “lineage” is an important protection against accidental screwups in the future. To benefit from this protection, it was necessary to manually reset the lineage to a freshly generated UUID in each split state, via direct JSON manipulation.

2) Recovering from a bug in terraform-provider-vault

Vault has something called a KV secrets engine. It has multiple versions. The Vault API allows two alternative expressions of requesting a version 2 KV secrets engine:

type="kv" options={"version": "2"}

or

type="kv-v2"

Vault itself will convert the second form to the canonical first form. terraform-provider-vault knows about this, and implements a special workaround for comparing the second form (in Terraform state) to the first form (retrieved from the Vault API during refresh) as equal.

However, terraform-provider-vault neglects to handle the canonicalisation correctly in the import operation.

So there I was, needing to import one of these resources defined in the second (non-canonical form), and stuck with a buggy import.

I wanted to just update my configuration to use the canonical form everywhere… but my configuration was in a module that was used hundreds of times already, managing existing resources, and Terraform would plan to destroy and recreate them if I just did that.

To solve this, it was necessary to build a custom script that would rewrite the attributes of existing resources in the state file to the canonical form via raw JSON processing.

3) Handling a transition between providers implementing the same resources

GitHub has two different APIs - REST and GraphQL. Some functionality is only available in the GraphQL API.

The main terraform-provider-github was taking a while to implement some of that. The community responded with terraform-provider-github-v4 which supplemented the main provider with additional resources. We used it.

Then eventually those resources got replicated in the main provider… but not with exactly the same resource schema!

Oh dear… without doing something fairly unusual, we were now stuck using the extra, now deprecated, provider forever.

I created a script to use direct JSON manipulation to rewrite the provider addresses and perform algorithmic transformations on resource attributes, so we could migrate back to the main provider.

In conclusion

The state file format is a bit like what’s under the bonnet of a car… with the right knowledge, you can do some very useful things, but it shouldn’t be fiddled with casually, nor should you assume that your knowledge is still valid, after an upgrade!

1 Like

Thank you @apparentlymart and @maxb for the very thoughtful, well-written replies. This is a great help. I wanted to add a couple of things for posterity:

  • In my own tinkering, I discovered terraform providers schema -json, which shows at least the schema of each provider, or what is inside the .resources key in the Terraform state file.

  • I agree with the part below from @maxb :

I’ve had to disobey this intent, and directly interface with the state JSON to get things done in some circumstances

There comes a time where the number of resources is challenging Terraform’s ability to scale.

I’m hitting both of these use cases currently, which I why I opened this thread. I also appreciate the recommendation of the “freshly generated UUID in each split state”. @apparentlymart Do you have any comments about this part by chance?

For one-off fixes in exceptional cases my warning about the snapshot format not being fixed doesn’t apply, because in that case you only need to worry about the state snapshot format of whatever version of Terraform you are currently using.

Whatever direct changes you made might not work the same if you were to repeat them again with a later version of Terraform, but as long as you keep that in mind while planning I think it’s pragmatic to make manual changes or to automate some careful surgical changes as long as you don’t expect to take that process with you to newer versions of Terraform.

With that in mind, you can see here the current implementation of how Terraform generates entirely new “lineage” strings when initializing the first state snapshot for a new configuration/workspace:

As you can see, Terraform uses standard UUID syntax for the lineage, so if you generate a new standard UUID using some other software and format it into the canonical UUID syntax then Terraform will accept that as a valid lineage string. Technically Terraform doesn’t currently actually verify the syntax of lineage at all and will just accept any string, but using a UUID string is the most likely to be forward-compatible with future changes to the snapshot format, since the automatic upgrade logic in Terraform should accept anything equivalent to what Terraform itself would’ve generated.

Although the above is true for all versions of Terraform that have so far included this concept of “lineage” – Terraform v0.9 through v1.3 – I cannot promise it will remain true for all future versions of Terraform. If you are someone who has found this comment much later after I posted it and are using a later version of Terraform then this information might be no longer applicable.

1 Like