I’m using a data source to generate lists of VM IP addresses. I use those lists as ansible inventory to provision the machines. I use following code:
data "digitalocean_droplets" "all" {}
data "aws_instances" "all_aws" {}
locals {
all_aws = data.aws_instances.all_aws.public_ips
all_do = data.digitalocean_droplets.all.droplets.*.ipv4_address
}
resource "local_file" "inventory" {
filename = "inventory"
content = <<EOF
[do]
%{for ip in local.all_do~}
${ip}
%{endfor~}
[aws]
%{for ip in local.all_aws~}
${ip}
%{endfor~}
EOF
}
This works great for me except one annoying caveat. Every time I add or remove a VM I need to run terraform apply twice - first run does creation/removal, the second run recreates local file inventory. I tried forcing the creation with depends_on both in local file and data sources, but it does not work - in ealy apply phase there is no new machines so the file is recreated with old contents.
Is there any hack to force refreshing the data source after machines are created? Using apply twice or using taint is frustrating
So depends_on in data source actually fixes the issue, however there are two problems with this approach:
I need to declare depends_on in every data sources
I have to modify the data sources and make them point to every resource with machine I have, which is counterproductive to the aim of using data source in thsi scenario (just to consolidate information without additioanal manual copy paste).
Is there any way to force a resource to be created last in the plan? I could just make a null resource and force the data sources to depend on it.
What you’ve described here is a typical problem that arises if you try to manage and read the same objects in the same configuration.
I typically recommend against doing this altogether. Instead, it’s often better to refer directly to the resource blocks that are managing these objects so that Terraform can automatically recognize that the local file depends on the EC2 instances and the DigitalOcean droplets, and therefore order the operations correctly without you needing to explicitly specify the hidden dependencies.
The snippet you shared doesn’t include the resource "aws_instance" "..." and resource "digitalocean_droplet" "..." blocks, so I can’t show a full example, but the general idea would be to remove the data "digitalocean_droplets" "all" block and the data "aws_instances" "all_aws" block and refer to one or more aws_instance and digitalocean_droplet resources elsewhere in the configuration. The resources to refer to will presumably be the same resources you previously specified in your depends_on argument.
I think I will abandon this approach altogether in favor of inventory plugins in Ansible. Approach I described above is really convoluted - I have several explicit resources and a dozen of resources defined in my module hierachy - trying to consolidate it this way is asking for trouble. I’ll just tag the instances and use this to build host hierachy for provisioning