Using PVC to bind runner to save terraform state file during crash

Hello,

When terraform runner pod is killed or crashed on kubernetes cluster, we find that terraform state file is also getting deleted along with pod and having mismatch with resources.
I am working on a solution to use PVC as an option to store our terraform state file by bounding PVC with runner pod.
I am making the change in config.toml file by adding

[[runners.kubernetes.volumes.pvc]]
  name = "runner-claim"
  mount_path = "/builds"
 driver = "kubernetes.io/aws-ebs"

By making this change, I am able to bind PVC with runner. But when I tested this, Killing the runner pod and rerunning the gitlab job to create a runner (and binding to same PVC) to create the infrastructure again is creating duplicate resources. This means that it is not able to pick up the previously created terraform state file on pod which was killed
Can anyone advise on my solution that terraform state can saved on disk which can be rebounded with pod by using PVC.

Thanks,
Sandeep

Are you using local state files or a remote state location (e.g. S3 buckets)?

Or is this that you are killing a Terraform run before it has been completed? If so, it is expected that this would cause issues. Terraform should not be killed forcefully and needs to be allowed to finish the process, at which time the state file is updated - you want to be using remote state (S3, database, etc.) rather than a local disk.

If Terraform is killed without being allowed to complete any changes that have actually happened will not be recorded in the state file. As a result you’d need to manually fix things - importing resources that were created or using terraform state rm to remove what has been destroyed.

Hi stuart-c,
Thanks for your reply. We are storing terraform state file in S3 backend and S3 backend files are updated after the successful completion of terraform.
We are working on a solution in which gitlab runner is crashed by itself when terraform apply command is running and this cause issues with several resources that has been created as those resources does not have any terraform state file linked and posted on S3.
For this reason to store the terraform state file, we are trying to use PVC attaching with gitlab runner pod where terraform apply script is executed. We expect it should store some portion of state file that can used/updated on the next run of runner while attaching the same PVC when earlier one was crashed.
We want to avoid manual work like using terraform import or terraform state rm and want our pipelines by itself handles all kinds of crash of runner when terraform apply runs. Is our solution looks feasible to you? Does terraform updates the terraform state file in runner on disk that we can use after attaching the PVC.
Thanks,

With remote state I don’t think there is anything on disk at all - the state download from S3 is held in memory until it is sent back to S3 once the run finishes.

You will need to either work on ensuring your runner doesn’t get killed or accept that you will occasionally need to use import and state rm to fix things - we very rarely see issues like this as Terraform runs don’t get killed.

Hi Stuart-c,
You mean that we don’t have temporary file created that will hold the partial data on this disk when we use S3 as a backend.
If we don’t use the remote backend and use the local backend as gitlab runner to hold terraform state file. will then it creates the temporary files on disk that can bounded with PVC and rebounded to new runner when it crashed and job restarted. What do you think?

Thanks,

I would imagine it still might hold things in memory.

Whatever you try killing Terraform is likely to result in a state file being out of sync with reality in many situations, with the resulting manual fixing being needed.

I would suggest focussing your efforts on stopping Terraform being killed in the first place.