I’m fairly new to terraform and have been dabbling with the CAF-Enterprise-Scale templates.
One very frustrating situation I’ve ran into is in the event that I don’t occasionally poke at the Azure CLI window while a Terraform apply is running, I’ll get booted/disconnected from the cloud shell and terraform will throw errors whenever I try to continue the plan or generate a new plan and apply.
Here’s an example.
- I generate a TFPLAN
- I apply the plan
- The plan takes 30+ minutes to run, and I forget to occasionally poke at the cloud shell window.
- Azure disconnects me due to timeout, leaving the apply in a transient state
- If I try to run the apply again Terraform will throw errors about the state being different and needing to generate a ne plan.
- I generate and then apply the updated plan
- Terraform throws a bunch of errors due to the objects that were created during the interrupted plan, stating that I need to import them.
What’s the easiest way to recover from this scenario without hosing and then rebuilding the environment? I read through the terraform import documentation but it looked like I’d probably have to do it on a resource by resource basis and couldn’t find any good examples for the situation I’m dealing with.
Any help is appreciated! Thank you
30 minutes is a pretty long time for an apply to take. I’d suggest looking at that and maybe splitting the root module into multiple separate ones to make things smaller/quicker to plan & apply.
terraform apply gets interrupted then your state file may no longer be in sync with reality. For things which exist in the state but have changed in reality a
terraform refresh will get things back in order, but if something has been created that isn’t in the state file (and can’t just be created again without clashes or ending up with several) the only way would be to use import, which as you say requires you to import each of the resources. For some things you can get away with not importing everything and then allowing Terraform to recreate things.
In general I would say that it is very important that Terraform always completes its run without timeouts that prevent it from saving the state file. I would split things that are overly large & slow, as well as moving the apply to a CI system instead of being manually run (that makes it less likely that network/laptop issues break things as well as being more controlled in general). You may also be able to tweak timeouts for some systems.
Is there a way to evaluate each resource in a state file against a plan and do a force import enmass for clashing objects?
You would need to write a script that interrogates the remote API, fetches data from Terraform (which can be output as JSON for script usage) and then figure out what is missing and needs importing. As there may not be an easy way to map existing resources to what is needed programmatically you would need to be careful to check what the script comes up with.