Hi,
My goal is to generate an Ansible inventory based on AWS EC2 instances (newly provisioned and existing). The project allocates EC2 instances and deploys monitoring software, prometheus(tagged with server), however the EC2 instances monitored by prometheus(tagged with node) will be allocated from other projects.
To access already existing EC2 instances, I’m using data.aws_instances which only returns a number of ids which in turn have to be iterated and resolved into actual instances with data.aws_instance. Problem here is that the loop expression in data.aws_instance errors because of the data.aws_instances not being available until apply e.g. ‘The “for_each” value depends on resource attributes that cannot be determined until apply…’.
Any ideas of how to make this work?
resource "aws_instance" "server" {
depends_on = [aws_security_group.allow-ssh-http]
count = 1
ami = data.aws_ami.ubuntu.id
instance_type = "t2.micro"
key_name = "foo"
associate_public_ip_address = true
security_groups = [
"allow-ssh-http"
]
tags = {
Name = "ops${count.index}"
Group = "ops"
Type = "server"
}
}
data "aws_instances" "instance_ids" {
depends_on = [aws_instance.server]
instance_tags = {
Group = "ops"
}
}
data "aws_instance" "ops_instances" {
depends_on = [aws_instance.server]
for_each = toset(data.aws_instances.instance_ids.ids)
instance_id = each.key
}
Hi @hansson.mattias,
Terraform’s for_each
feature relies on having some stable identifier to use as part of the address of each instance of the resource, and so it isn’t valid to use any data that won’t be known until the apply step.
However, I think the root problem here is that you seem to be trying to both manage and read the same objects at the same time. A particular Terraform configuration should either manage an object (with a resource
block) or read in that object (with a data
block), but not both at the same time.
Your comment here suggests that you have two separate collections of instances, where one collection is managed directly by this resource "aws_instance" "server"
block. I’m not sure I fully followed how your tagging scheme fits in here, but I would suggest you set up your tags in such a way that your data "aws_instances" "instance_ids"
only reads the other instances that aren’t managed by this configuration, and thus you can avoid the need to declare depends_on = [aws_instance.server]
and thus in turn allow Terraform to read that data source during the planning step, before the managed instances have been created.
I think you said that the ones that are managed elsewhere are tagged “node” and so I assume you mean Type = "node"
. With that assumption in mind, I’d suggest adding Type = "node"
to the instance_tags
argument to filter out the “server” instances, and then you can use a separate expression elsewhere to merge those two collections together if you need to treat them all the same for inventory purposes:
resource "aws_instance" "server" {
count = 1
ami = data.aws_ami.ubuntu.id
instance_type = "t2.micro"
key_name = "foo"
associate_public_ip_address = true
vpc_security_group_ids = [aws_security_group.allow-ssh-http.id]
tags = {
Name = "ops${count.index}"
Group = "ops"
Type = "server"
}
}
data "aws_instances" "nodes" {
instance_tags = {
Group = "ops"
Type = "node"
}
}
data "aws_instance" "node" {
for_each = toset(data.aws_instances.nodes.ids)
instance_id = each.key
}
locals {
all_instance_ip_addrs = setunion(
aws_instance.server[*].private_ip,
data.aws_instance.node[*].private_ip,
)
instance_ip_addrs_by_type = tomap({
server = toset(aws_instance.server[*].private_ip)
node = toset(data.aws_instance.node[*].private_ip)
})
}
That locals
block at the end declares local.all_instance_ip_addrs
as a set containing all of the private IP addresses across both your server instances and your “node” instances. I also included a bonus local.instance_ip_addrs_by_type
example which keeps the two separated by their types in case you want to differentiate them into different groups in your inventory.
Thank you for the quick reply. Is there any way to work around empty results from data-blocks? With the setup I showed you I am forced to allocate at least one node for the project to apply.
When I looked for answers to empty data-blocks I stumbled on discussion. Some of the answers were that it’s not supported and one answer explained to work around it but not how. What is the current state of the matter, still not supported? If so, is there any way to work around it in my case?
I have a templatefile output that takes local.server and local.node and generates an output in the form of an ansible inventory with two host groups, server and node. The groups are allowed to be empty.
Hi @hansson.mattias,
It’s ultimately up to the implementation of each individual data source to decide what is a valid result vs. not.
Singleton data sources (with singular names, like aws_instance
) typically fail when they don’t find exactly one match, because they are intended to represent dependencies on objects declared elsewhere, in which case it is an error to apply the configuration before that elsewhere object exists.
Multi-selection data sources (with plural names, like aws_instances
) often have different rules, but it sounds like this particular one still has a requirement that it must match at least one EC2 instance. If that’s how that data source is implemented then there will be no way to subvert that behavior from the Terraform language itself; it would require a change to the AWS provider.
It looks like AWS Provider PR #21219 changed the behavior of aws_instances
(and various other “plural” data sources) to successfully return an empty set when the query matches nothing, but since that was only merged 12 days ago I expect it probably isn’t in any published release yet. If I’m understanding correctly the changelog, it looks like this change is planned for the forthcoming major release v4.0.0, since it’s a change in behavior that may affect existing configurations.