State backend benchmarks

Hey there,

Has anybody tried to benchmark different kinds of backends?

For example, could pg be significantly faster than s3+dynamodb?
By faster, I mean instead of taking 15 minutes to run a full plan, take let’s say 14, 13.

While I haven’t, I wouldn’t imagine that the backend makes much difference. The state is cached locally (in .terraform/terraform.tfstate) and is copied to and from the backend only at the start and end of the process.

This is likely to take only a few seconds even with the slowest choice.

Equally the state locking and unlocking only happens at the start and end.

The majority of the time for a plan will be the refresh process, where the different Terraform providers call APIs to update the status of every resource, and the actual graph walking and difference process, where the code is compared with reality and the required changes calculated.

For the first speed really depends on the APIs, but you could make improvements by for example being “closer” in network terms to the API endpoints, for the second it is generally CPU heavy, so a faster processor would help.

Alternatively we split things into multiple state files where it makes sense (for example low level network separate from applications) partially to keep plan times down.