Module_X amongst many other resources has null_resource.register_providers with local exec to run a bash script. The script registers multiple Azure Resource Providers.
No idea why azurerm_resource_provider_registration wasn’t used. I joined this team only a week ago.
Module_X also has an output for null_resource.register_providers
output "register_providers" {
value = null_resource.register_providers
}
in the main TF folder we call all 4 modules, but the null_resource.register_providers in Module_X has to be completed before other three modules are run. Hence, the code in the main TF folder calling Modules A,B and C have
This config works just fine for the first run, however, on the consequtive runs TF want to replace all resources in Modules A,B and C even though there are no changes to the code.
After playing a bit I figured that it all comes down to null_resource. Just for a test I replaced it with some other resources in the output of Module_X and TF stopped trying to replace all objects in Modules A, B and C.
The dependency between modules is actually required for the first run only. The current workaround is to remove this dependency from the code manually after all resources are provisioned, but I am after a proper solution.
Without a complete example I can’t explain exactly how the change is being triggered, but the common cause is the use of depends_on with a data source. If you specify that a data source depends_on a managed resource, that data source cannot be read until any pending changes in the managed resource have been resolved. Adding depends_on to a module means that everything within that module depends on the referenced value.
The solution is to remove the blanket depends_on statement, and assign the dependencies only where they are required. If there is no explicit assignment possible, this may mean setting depends_on only on the specific resource which needs it within the module.
Another guess here is from the clue that the depends_on is only required for the first apply. This usually hints at a managed resource and data source representing the same logical resource within a single configuration. If that’s the case the solution is to remove the data source and pass the managed resource value directly to the dependencies which need it.
I think an extra clue here is the fact that this null_resource resource’s provisioner seems to be creating a remote object in a way that would more typically be done with a resource block. As @vmnomad noticed, it’s a little strange to be creating an object like that when there is a resource type available for that same object type in the provider; my guess would be that the resource type was added to the provider at some later point and this provisioner approach was a workaround to avoid waiting for a new provider release.
Given that, I wonder if there’s a data block inside that module which tries to look up the object that the provisioner created, in order to use it as if it was a normal Terraform resource. Unfortunately, that can then fall victim to the problem @jbardin described where the configuration is telling Terraform to defer reading the data resource until the apply step, which in turn causes downstream resources to need to be replaced.
My suggestion in that case would be to try to replace the provisioner here with a real azurerm_resource_provider_registration resource and then return whatever attributes of that resource are required by the other module, so that you can pass them across to achieve a module composition design:
The goal here then would be that this module_a just takes the provider name determined by the other module and uses it directly, rather than attempting to look it up again using a data resource. That should then allow the configuration to converge in a stable state, once that registration is created and its name attribute (used by other resources in the other module, presumably) remains stable.