Terraform state file not getting created/updated on Gitlab runner crash

mysandy603 · October 18, 2022, 12:13pm

Hi,

We are using Gitlab pipeline to trigger our Terraform init, plan and apply to create multiple resources on AWS. Also we are using S3 as remote backend to store the terraform state file. Normally everything works fine.

But in case terraform runner is killed or crashed in between running terraform apply, we find several resources in AWS created but few not created. But we also do not find the terraform state file updated on S3. This is causing mismatch of state file and resources created.
Can you please guide on how to deal with such scenario.

Thanks,
Sandeep

stuart-c · October 18, 2022, 1:35pm

If a run doesn’t correctly complete you indeed may find that the state file & reality are out of sync. You will need to manually investigate what is missing/extra and then use terraform import and terraform state rm and/or remove resources manually from AWS to get things back in line.

mysandy603 · October 27, 2022, 7:16am

Thanks stuart-c for your reply.
Actually we are looking out for solution in which terraform state file can be restored if runner is crashed. or all resources created by pipeline get destroyed till it get full success.
Any ideas are welcome for this solution?

stuart-c · October 27, 2022, 11:35am

The issue is that nobody knows what needs doing without investigation.

The issue is that API calls have been sent to create, delete or modify resources, but before the state file is updated the Terraform program is terminated. Therefore there is no record of those new resources having been created.

You could revert the state file to a previous version (assuming you are using something like a S3 bucket with versioning) but you still wouldn’t know about those resources which are in state but removed or created by not recorded.

In general it needs some level of inteligence (i.e. not something that is easy to do programatically) to figure out which resources map to what inside Terraform.

For example before you run Terraform there are 10 EC2 instances all of the same size. You run Terraform and it starts creating some new instances, but it is terminated before the state file is updated. There are now 20 instances. Some of those could be from API calls Terraform sent before being aborted, but how do you know which instances maps to which Terraform resource? There might also be instances created by an autoscaling group (and therefore not something you should map to Terraform state), things created by a separate Terraform root module, or things which are managed totally outside of Terraform. You need to look at those instances and the code and try to figure it all out - not something that can be done easily by a generic program.

mysandy603 · October 27, 2022, 11:55am

Thanks stuart-c,

I was just visualizing to use PVC with gitlab runner so in case gitlab runner is crashed, new runner can pick the state file stored on PVC. Then terraform can pick it up to create new resources and we have persistent state file in runner.
Does this make sense? What do you think of this?

stuart-c · October 27, 2022, 1:35pm

The state is stored in memory before being sent to the remote state location (assuming you are using a remote backend). So it wouldn’t help as nothing has changed on disk.

Topic		Replies	Views
Using PVC to bind runner to save terraform state file during crash Terraform	6	433	November 25, 2022
Terraform state issue when "terraform apply" errors out in middle of the run Terraform	2	1879	September 19, 2022
Upgrading terraform version and update the state file Terraform	1	3871	August 24, 2021
How terraform handles the crash? Terraform	1	295	June 3, 2022
Terraform not sync state from s3, apply will create all resources as new Terraform	3	723	October 13, 2022

Terraform state file not getting created/updated on Gitlab runner crash

Related topics