Terraform state issue when "terraform apply" errors out in middle of the run

Hi,
We are creating our entire AWS infrastructure using terraform. Its creating around 500 resources. We are using gitlab pipeline to run a script which does “terraform apply” and it starts creating required AWS resources. In some cases gitlab runner dies in the middle of the run and the script ends abruptly. There is another case as well where ECS service deployments gets stuck and the gitlab pipeline timeouts out after certain time and that also ends the run abruptly. These and few other cases where the pipeline which run the script ends abruptly and leave the terraform state without getting updated with resources it already created. When I run the pipeline again to continue applying from where it left, it gives error that resources already exists and the pipeline fails. The error shows many resources which are already created but not managed by the TF state for which I was running the pipeline. If it would have been one or two resources, I can do terraform import and import those resources into the state. But since there are many resources, its getting difficult to add them all to state and get the state to latest.
The help which I am looking for is:

  1. How can this situation be handled where the the terraform apply is interrupted in the middle and state not updated properly?
  2. Has anyone come across any code/script which can read through the errors in pipeline about already existing resource and add them to state to recover the state of the infrastructure ?

Please let me know if you need any other details.

Thanks,
Dwarkesh Marakna

Hi @dwarkesh,

It’s generally recommended to run Terraform on “reliable” machines, like you would for a stateful service like a database. Running the apply process on an ephemeral instance which may not last for the duration of the apply operation can run into problems like you have described.

Any time you are creating new resources and the Terraform process fails entirely, those individual resources will need to be imported into terraform. How one goes about that is likely dependent on what those resources are. If the state of those resources is not important, it may be more efficient to run a cleanup script outside of terraform to remove them, and use terraform again to create all new instances.

Hi @jbardin ,
Thank you for your response. As you suggested I am checking in parallel about the gitlab runner issues to get some stable runs for terraform apply.
But if the issue which I described happens, then to sync up the state there seems to be only few option mainly either import that resource into the state or recreate the resource.
Any opensource code which you might be aware that can automate the import of multiple resource which failed in my pipeline run?

Thanks