Get EC2 instances

hansson.mattias · February 2, 2022, 12:59pm

Hi,

My goal is to generate an Ansible inventory based on AWS EC2 instances (newly provisioned and existing). The project allocates EC2 instances and deploys monitoring software, prometheus(tagged with server), however the EC2 instances monitored by prometheus(tagged with node) will be allocated from other projects.

To access already existing EC2 instances, I’m using data.aws_instances which only returns a number of ids which in turn have to be iterated and resolved into actual instances with data.aws_instance. Problem here is that the loop expression in data.aws_instance errors because of the data.aws_instances not being available until apply e.g. ‘The “for_each” value depends on resource attributes that cannot be determined until apply…’.

Any ideas of how to make this work?

resource "aws_instance" "server" {
        depends_on = [aws_security_group.allow-ssh-http]
        count = 1

        ami                         = data.aws_ami.ubuntu.id
        instance_type               = "t2.micro"
        key_name                    = "foo"
        associate_public_ip_address = true

        security_groups = [
                "allow-ssh-http"
        ]

        tags = {
                Name  = "ops${count.index}"
                Group = "ops"
                Type  = "server"
        }
}

data "aws_instances" "instance_ids" {
        depends_on = [aws_instance.server]

        instance_tags = {
                Group = "ops"
        }
}

data "aws_instance" "ops_instances" {
        depends_on  = [aws_instance.server]
        for_each    = toset(data.aws_instances.instance_ids.ids)

        instance_id = each.key
}

apparentlymart · February 2, 2022, 5:43pm

Hi @hansson.mattias,

Terraform’s for_each feature relies on having some stable identifier to use as part of the address of each instance of the resource, and so it isn’t valid to use any data that won’t be known until the apply step.

However, I think the root problem here is that you seem to be trying to both manage and read the same objects at the same time. A particular Terraform configuration should either manage an object (with a resource block) or read in that object (with a data block), but not both at the same time.

Your comment here suggests that you have two separate collections of instances, where one collection is managed directly by this resource "aws_instance" "server" block. I’m not sure I fully followed how your tagging scheme fits in here, but I would suggest you set up your tags in such a way that your data "aws_instances" "instance_ids" only reads the other instances that aren’t managed by this configuration, and thus you can avoid the need to declare depends_on = [aws_instance.server] and thus in turn allow Terraform to read that data source during the planning step, before the managed instances have been created.

I think you said that the ones that are managed elsewhere are tagged “node” and so I assume you mean Type = "node". With that assumption in mind, I’d suggest adding Type = "node" to the instance_tags argument to filter out the “server” instances, and then you can use a separate expression elsewhere to merge those two collections together if you need to treat them all the same for inventory purposes:

resource "aws_instance" "server" {
  count = 1

  ami                         = data.aws_ami.ubuntu.id
  instance_type               = "t2.micro"
  key_name                    = "foo"
  associate_public_ip_address = true

  vpc_security_group_ids = [aws_security_group.allow-ssh-http.id]

  tags = {
    Name  = "ops${count.index}"
    Group = "ops"
    Type  = "server"
  }
}

data "aws_instances" "nodes" {
  instance_tags = {
    Group = "ops"
    Type  = "node"
  }
}

data "aws_instance" "node" {
  for_each    = toset(data.aws_instances.nodes.ids)

  instance_id = each.key
}

locals {
  all_instance_ip_addrs = setunion(
    aws_instance.server[*].private_ip,
    data.aws_instance.node[*].private_ip,
  )

  instance_ip_addrs_by_type = tomap({
    server = toset(aws_instance.server[*].private_ip)
    node   = toset(data.aws_instance.node[*].private_ip)
  })
}

That locals block at the end declares local.all_instance_ip_addrs as a set containing all of the private IP addresses across both your server instances and your “node” instances. I also included a bonus local.instance_ip_addrs_by_type example which keeps the two separated by their types in case you want to differentiate them into different groups in your inventory.

hansson.mattias · February 5, 2022, 9:56am

Thank you for the quick reply. Is there any way to work around empty results from data-blocks? With the setup I showed you I am forced to allocate at least one node for the project to apply.

When I looked for answers to empty data-blocks I stumbled on discussion. Some of the answers were that it’s not supported and one answer explained to work around it but not how. What is the current state of the matter, still not supported? If so, is there any way to work around it in my case?

I have a templatefile output that takes local.server and local.node and generates an output in the form of an ansible inventory with two host groups, server and node. The groups are allowed to be empty.

apparentlymart · February 7, 2022, 5:12pm

Hi @hansson.mattias,

It’s ultimately up to the implementation of each individual data source to decide what is a valid result vs. not.

Singleton data sources (with singular names, like aws_instance) typically fail when they don’t find exactly one match, because they are intended to represent dependencies on objects declared elsewhere, in which case it is an error to apply the configuration before that elsewhere object exists.

Multi-selection data sources (with plural names, like aws_instances) often have different rules, but it sounds like this particular one still has a requirement that it must match at least one EC2 instance. If that’s how that data source is implemented then there will be no way to subvert that behavior from the Terraform language itself; it would require a change to the AWS provider.

It looks like AWS Provider PR #21219 changed the behavior of aws_instances (and various other “plural” data sources) to successfully return an empty set when the query matches nothing, but since that was only merged 12 days ago I expect it probably isn’t in any published release yet. If I’m understanding correctly the changelog, it looks like this change is planned for the forthcoming major release v4.0.0, since it’s a change in behavior that may affect existing configurations.

Topic		Replies	Views
The "for_each" set includes values derived from resource attributes that Terraform	2	236	November 6, 2024
Terraform apply - module.instance is a list of object, known after apply Terraform	0	339	May 4, 2023
Using for_each inside Data Source (aws_instance and aws_instances) Terraform	3	3563	July 1, 2020
Modules: how to maintain N similar resources (ec2 instances)? Terraform	1	323	June 9, 2022
When I change count to for_each, the resource will recreate? Terraform	2	3221	August 9, 2019

Get EC2 instances

Related topics