Terraform Destroyed Resources when doing batch deployments

Hello, first time posting.

Recently we saw an interesting issue during a recent Terraform-Jenkins deployment.

So we have the following commits

Base Commit: (Version that our TF state is on before deployment)
Commit A: Adds a new AWS region to our provider as part of region expansion
Commit B: Spins up autoscaling cache
Commit C: Adds env variables to an ECS cluster’s tasks

When we went directly from Base Commit to Commit C, we observed that Terraform attempts to replace (delete and create) certain resources like our ECS ALB Target Group, among other additions that were expected. This is definitely not what we want, as it would knock out traffic to our containers.

Instead, when we went from Base Commit to Commit A, then Commit A to Commit C, we ONLY saw the additions, (no +/- or -/+)

We are using terraform version 0.13.7, has anyone seen this occur or have an explanation on why terraform may choose to delete resources if the commits are batched? Our expectation was that no matter if we went Base → C vs Base → A → C, terraform should have done the same thing (strictly add resources)

Hi @kzhou-chwy,

It’s hard to get into specifics here because we’re only talking at a high level about specific services I’m not super familiar with, but a general rule about Terraform is that each plan will only propose at most one action per resource instance, and so it is in principle possible to construct a series of steps that can move more gradually between two situations than what would happen if you tried to skip directly to the final step.

One example of how this can occur relates to the fact that many resource types have attributes that the provider can’t actually know until the final apply step, and so the provider will report that they are “unknown values” (which appear as (known after apply) in the plan).

Terraform and Terraform providers make no particular assumptions about what final value an unknown value will take except tracking what data type it’ll have. It’s possible that the final value will actually end up being something that doesn’t lead to the need to replace another object, but Terraform and the providers must conservatively assume that won’t be true.

However, if you apply in two steps then any objects created by the first step will have exact, known values for the second plan, and so Terraform and the providers have more information and can thus produce a more precise plan, potentially avoiding the pessimistic assumption that something will need to be replace (because the provider can see that the value final value actually matches).

I don’t know if that’s what’s going on for the situation you’re asking about, but I’m sharing it as an example of a way in which small steps can produce a different result than big steps. This is a consequence of Terraform’s model of planning a sequence of actions where some actions depend on the outcome of others, and so we can mitigate it only by making that sequence shorter and thus reducing the opportunities for ambiguity about what the result will be.

1 Like