Terraform not detecting drift on resources. Intended or not?

Can someone point me to documentation that confirms how Terraform detects drift and what it determines is a change or not? I already know about the lifecycle{} meta-argument, but I just found behavior that surprised me and I need confirmation if this is intended or not, so we can make business decisions based on this behavior.

Here is the situation - I created an azurerm_app_service with various configuration settings. In this azurerm_app_service resource, I did not specify a ip_restriction{} block insight the site_config{} block. But I do have a site_config{} block with other things.

In the Azure Portal UI, I defined a few IP restrictions. As a test, I ran Terraform Apply to see if Terraform would detect these IP restrictions and remove them since I didn’t define an ip_restriction{} block in site_config{}. To my surprise, Terraform did not revert the IP restrictions I set in the Azure Portal UI. Now, for me, this might work out quite nicely because our teams can manage certain configuration settings in Azure Portal without worrying about Terraform resetting it all. But on the other hand, I’m afraid this might be some kind of “bug feature” that Hashicorp might say “oops, we fixed that”

My understanding is that the purpose of the lifecycle.ignore_changes meta-argument is to tell Terraform not to change certain configurations. So, why does Terraform decide to automatically ignore changes to configuration settings that are omitted from the Terraform configuration script? Is this intended or not?

So, I found some documentation that goes into detail about detecting drift. Then I looked at the app_service.go in the terraform-provider-azurerm repo in GitHub. The schemaAppServiceDataSourceIpRestriction() function says it computes these settings, but Terraform doesn’t detect it…so I’m not sure I really understand how drift actually works.

What is further confusing is if I add tags to the web app in Azure Portal UI, and run terraform apply, then the tags are reverted because the tags are not in the terraform configuration even though I didn’t include the tags block in the terraform script. This drift detection is very inconsistent.

Hi @Terraform-man,

As I think you’ve already learned from the documentation, Terraform’s “drift detection” is highly dependent on how a provider is implemented, and so unfortunately there cannot be a consistent general answer for how it will behave in all cases.

However, it might help to know how Terraform defines “drift”:

During the plan step, Terraform asks the providers to read the remote object bound to each of the resource instances and generate a new state object based on the current remote object state. Terraform then saves that updated object. Next, it sends the updated state object and the current configuration object to the provider so the provider can compare the two and determine whether the configuration and state are equal. If not, one of the following is presumably true:

  • The remote system was unchanged but the configuration has changed since the last apply. (The normal case: updating the configuration)
  • The configuration is unchanged but the remote system has changed since the last apply. (This case is what you might call “drift”.)

This operation only considers the current state and the current configuration, so it can’t actually distinguish between the two cases above and the result is the same in either case: the provider tells Terraform to take some sort of action that will produce a new state object.

Under this model, there are a number of different reasons why a particular sort of “drift” might not be detected, including but not limited to:

  • It might be a change to some aspect of the remote object that the current provider version isn’t aware of at all, so the change is invisible to the provider.
  • It might be a change to something that was not specified in the configuration at all but instead selected dynamically by the remote API. In that case, the provider has no option but to assume the remote API is the “correct” value, because there’s no configuration to compare it with.
  • The change was to something value that the underlying API considers to be “write-only”, and so the provider can’t read back the updated value to compare it with the configuration. This is essentially the opposite of the previous case: the provider assumes the configuration is correct because it has no remote API value to compare with.
  • You might have created something that the provider considers to be an entirely new object rather than a change to your existing object, in which case the new object gets ignored so that multiple Terraform configurations and potentially other systems can all coexist in the same account. (The provider assumes that the new thing you created is managed by some other Terraform configuration or some other system.)
  • The provider might consider the change you made to have a “functionally equivalent” meaning to what you originally configured, and so decline to perform an update for it. One example of this is if a particular value is a JSON string and you just changed some space characters between tokens, and so the it still represents the same JSON value even though the source code is different.
  • There’s just a bug in the provider that causes it to fail to report a change that it ought to have reported.

I’m not familiar enough with the specific azure features you mentioned to known if what you saw is caused by one of the situations I described above, or some other similar situation I didn’t think of while I was writing this out.

However, your note that “it computes these settings” sounds like the terminology provider developers typically use to talk about values that are unstated in the configuration and chosen by the remote system, and so my guess would be that this is a situation where the configuration doesn’t say anything about IP restrictions, and so the provider is just assuming that the objects chosen by the remote API must be correct.

1 Like

To detect drift in resources created by terraform you need run terraform plan with the [ -detailed-exitcode ] flag.(https://www.terraform.io/docs/commands/plan.html#detailed-exitcode)

-detailed-exitcode - Return a detailed exit code when the command exits. When provided, this argument changes the exit codes and their meanings to provide more granular information about what the resulting plan contains:

  • 0 = Succeeded with empty diff (no changes)
  • 1 = Error
  • 2 = Succeeded with non-empty diff (changes present)

If there is drift, the exit code will be 2. If no drift, exit code will be 0.

Thanks,

In case the above doesn’t work out, you could try the following.
(The above example will not work out for any resource that was originally not created by Terraform. To make it work with also resource created outside of Terraform please do the following )
Save an initial plan with your code having a query of the real world infrastructure - usually done by data.#provider#.#resource#
After the change in your resource from UI again save a plan. Now you could do a diff with these 2 plans and the difference would show.

Thanks,

@apparentlymart Thanks for the detailed explanation. This is very helpful. So, it seems that the safest thing to do is to just implement the lifecycle.ignore_changes argument no matter what, so in the case that the provider decides to update the schema, I won’t get unintended drift reverts, in my case.

Declaring ignore_changes explicitly is indeed the safe option, and I’d also suggest it’s a helpful thing to do because it’ll make your intent more clear to future readers of your configuration, so they can be aware that something unusual is happening with the IP restrictions in your particular system. (I’d probably also include a comment above the ignore_changes entry describing what is unusual about it, to help that future reader with further understanding.)

1 Like