Terraform want to recreate the entire architecture

Hello everyone.
I’ve run into a bit of a problem and can’t find answers online
We used to develop and deploy with Terraform in local on a Windows10 computer. Now that we’re moving the pipeline to Jenkins (that is running on CentOS) between our gitlab repository and AWS environment.

When we execute the plan in jenkins, terraform want to destroy and recreate everything despite been up to date when using my local computer

terraform correctly find the tfstate in our backend ( terraform show return identical result). We are using a S3 bucket as the backend
init has correctly been done and terraform provider is identical as well

Terraform Version

terraform 12.28

image
image

image

Terraform environement

jenkins 2.249.2
terraform Plugin 1.0.10
pipeline AWS Steps 1.43

jenkins machine: Centos 7.7.1908
local computer: Windows 10

I’m currently out of ideas on what could cause the difference of behavior between local and jenkins. If you have any idea what may have caused it, anything is welcome

The plan should indicate what it is wanting to change or recreate and why.

What does that show?

the plan is either to update everything within the perimeter or if it is impossible to replace it (so around 550-600 elements)



one again, this behavior only occur in jenkins and not on our computers.

Hi @lucbrug,

The annotations like # forces replacement are the best place to focus when debugging this. Try to look for one where the new value is a known value, rather than (known after apply) in order to find the root change that prompted all of the other changes.

Hi,
What is the jenkins’s output? Is your jenkins’s job fetching the terraform state correctly?

Good luck
Regards,

thanks a lot for the very good idea, unfortunately for us (mostly me) all 400 forces replacement lead to unknown value.

thanks a lot for your time,
I thinks the tfstate is fetch correctly, as the backend is successfully configured in the init stage + when I use terraform show to fetch the state I get the same result on my computer and in jenkins which lead me to believe the state is correct

I can not copy the entirety of the jenkins output for obvious size reasons but I can tell you that after 4min, jenkins finishes successfully

If all of the “forces replacement” changes are showing as (known after apply) then I think the most likely explanation would be that somewhere else in the plan there will be an action to create something that ought to already exist. You mentioned that there are “317 to add” so I understand that it’ll probably be challenging to review all of them, but if you can identify at least one that is describing a plan to create something that already exists in the remote system then that at least leads to a likely explanation…

If you are using different remote API credentials in your automation environment to your local environment then those remote API credentials may have different access than the credentials you are using locally. If those credentials lack access to one or more objects then the remote API would return an error when Terraform tries to “refresh” it.

Unfortunately a typical API design is to represent “access denied” via the same 404 Not Found response code as would signal that the object has been deleted, and so in that case the Terraform provider can’t tell the difference and will assume that the object has been deleted and therefore needs to be recreated.

Another reason this can happen for AWS in particular is if you are choosing a region dynamically somehow – rather than by having the region name hard-coded in your configuration – and your automation disagrees with your local system about which region to use. Because AWS regions have (for most services) entirely separate namespaces, refreshing an object created in one region against another region will typically cause the AWS API to return a “not found” error, with the same consequence as I described above for access denied.

In both cases you’d need to address that by making sure that the execution environment for your automation matches the execution environment for your local computers.

1 Like

hi, thanks for the answer and sorry for the delayed response.
I think I wasn’t very clear and I will try to correct it to avoid more confusion in the future, terraform does not plan to create anything but to replace everything. So it is not a problem with finding all the resource.
the plan result may have been confusing as it show a replace a destroy + a plan but the whole infrastructure is found

API credential may have been a solution unfortunately I tried with both with no change in the result.

and lastly, the region is hard coded.

once against thanks for your input

Could you show some of the plan output?

Is this enough? it show the 2 type of actions terraform want to perform: uptade if possible replace otherwise

OK. So from those two we can see that the DNS record is being updated due to the record being different. As that is an A record that will be pointing at the public/private IP address of something (load balancer, EC2 instance, etc.). Presumably that is being replaced, which is why there is the change.

For the volume attachment as it says the change in both the instance_id and volume_id will cause that resource to be detroyed & recreated. The instance_id is presumably linked to something like an EC2 resource which is down to be replaced.

So that explains why those two are being updated, but you’d need to then track back to see why the resources they depend upon are also being changed. Could you show the plan output for those linked resources?

thanks to everyone and especially @stuart-c and @apparentlymart.
As you suspected, something was changed in the instance and I didn’t see it: we use boostrap and because we went from windows to linux, the ends of line in the script were misinterpreted causing the whole thing to be updated

I just added a steps in my pipeline to convert the files back to their “normal” state and everything I working perfectly fine

thanks again for the help