This has to be the most annoying painful part of Terraform is that the data sources are not idempotent. If you fetch state from the cloud, and that resource does not exist, then Terraform exits and fails.
It should return an empty object that we can test for and use with count variable to deploy/provision resources based on the return status, not just raise an exception.
Because of this, it is all to commonplace to get code in a state where you can NEVER destroy the resource because Terraform exits with non-zero. So you end up having to purge the resources and follow up purging the state.
When you have to consistently hand hold the tool to get it to work properly, then the platform is FAIL. I hope this can be fixed in the future, e.g. Terraform 2.0.
Terraform expects you to either be creating & managing a resource, or not creating it and just consuming that resource (via a data source).
Even if you did get an empty response from a data source if the underlying resource didn’t exist (which is possible for some data sources) that still wouldn’t help. You can’t conditionally create a resource based on a data source trying to fetch information about the same resource - if it returned null and you conditionally created the resource on the next run it would return some data and you’d therefore remove the resource.
If Terraform expects you to ONLY create a resource, then why have a data source concept in the first place. This is a lookup, the platform supports lookups.
There uses cases where you may want to say “if resource doesn’t exist, create it”. You cannot do this in Terraform, but you can on other platforms.
There’s also scenarios, where resources are created indirectly, such as Kubernetes, but if the process failed mid-way, you cannot delete the resources that were created, because the whole module will now fail. Thus you have to go around Terraform to fix this scenario, such as terraform state rm after removing the resource by another means. Another pattern, is to put guard rails, count, in every data source that can be overridden, so that you can manually tell Terraform don’t do the lookup, so that you can delete the resource.
You don’t want this root module to manage a particular resource (it might be managed manually, automatically by your cloud provider, using a different IaaC tool or just in a different Terraform root module) but you still need to fetch some details about it. This is where you’d use a data source. There are other options too - for static resources you could just hardcode the details (possibly via a shared module) or for resources managed within another Terraform root module you could couple the two using remote state,
You want Terraform to manage a resource. In this case you’d use a normal resource block. If the resource already exists (for example it was previously manually created) and you want Terraform to take over the management (and you’ve checked that nothing else will try to manage it) you would use terraform import to tell Terraform to start managing it. If other things in the same root module need details about that resource you’d just reference the resource - there would be no use of data sources at all.
An example of where a data source might be useful is within AWS wanting to get the details of the latest AMI which has a particular set of tags. Some other process is actually creating the AMI, so a Terraform resource block isn’t appropriate, and hard coding the value isn’t possible because it changes.
I’m not quite sure what you mean here. Could you give an example? If Terraform isn’t managing some resources (because they are created by something else indirectly) it shouldn’t know anything about them, and therefore nothing needs to be done within Terraform if things fail. I’m not sure if it is the situation you are thinking about, but if Terraform is used to install a Helm chart and for some reason that fails it is up to Helm to handle any rollback/errors. Terraform would just know that the Helm deployment failed rather than anything about the individual Kubernetes resources that it might have (or not) created. If the Helm deploy did fail on a subsequent Terraform run Helm would be asked to deploy again and would figure out how to achieve that.
I’m fully in agreement with @darkn3rd here, we had to disable state refreshes during destroy operations precisely due to this design issue in Terraform. Terraform is a workflow orchestration tool, similar to a workflow engine. We all know what happens when the tasks of our workflows are not idempotent, it renders the entire thing unreliable, and that’s precisely what Terraform with all its providers is, unreliable.