Using terraform_remote_state over common data filters

This question was originally asked on Stackoverflow.


I would like to understand when it is recommended to use terraform_remote_state over common data filter approaches.

I see cases like images, which are not managed by another terraform state in which case the obvious (and only) choice are data filters. However, in most cases I could choose between terraform_remote_state and other data filters. I could not find an official recommendation on that matter.

Let’s take an example (The following code does not run as is and is simplified to only show the main idea):

Let us assume we have a central component with its own state/workspace

vault/main.tf:

terraform {
  backend "azurerm" {
    storage_account_name = "tfstates"
    container_name       = "tfstates"
    key                  = "vault/all.tfstate"
  }
}

provider "openstack" {
  version     = "1.19"
  cloud       = "openstack"
}

resource "openstack_networking_subnetpool_v2" "vault" {
  name              = "vault"
  prefixes          = ["10.1.0.0/16"]
  min_prefixlen     = 24
  default_prefixlen = 24
}

resource "openstack_networking_network_v2" "vault" {
  name           = "vault"
}

resource "openstack_networking_subnet_v2" "vault" {
  name            = "vault"
  network_id      = openstack_networking_network_v2.vault.id
  subnetpool_id   = openstack_networking_subnetpool_v2.vault.id
}

// Make cidr available for terraform_remote_state approach
output "cidr" {
  value = openstack_networking_subnet_v2.vault.cidr
}

....

Option 1: Whitelist vault cidr in another tf workspace with data filters

postgres/main.tf:

terraform {
  backend "azurerm" {
    storage_account_name = "tfstates"
    container_name       = "tfstates"
    key                  = "postgres/all.tfstate"
  }
}

provider "openstack" {
  version     = "1.19"
  cloud       = "openstack"
}

data "openstack_identity_project_v3" "vault" {
  // assuming vault is setup in its own project
  name = "vault"
}

data "openstack_networking_network_v2" "vault" {
  name      = "vault"
  tenant_id = data.openstack_identity_project_v3.vault.id
}

data "openstack_networking_subnet_v2" "vault" {
  name      = "vault"
  tenant_id = data.openstack_identity_project_v3.vault.id
}

resource "openstack_networking_secgroup_v2" "postgres" {
  name        = "postgres"
  description = "Allow vault connection"
}

resource "openstack_networking_secgroup_rule_v2" "allow-vault" {
  direction         = "ingress"
  ethertype         = "IPv4"
  security_group_id = openstack_networking_secgroup_v2.postgres.id
  remote_ip_prefix  = data.openstack_networking_subnet_v2.vault.cidr 
}

Option 2: Whitelist vault cidr in another tf workspace with terraform_remote_state

postgres/main.tf:

terraform {
  backend "azurerm" {
    storage_account_name = "tfstates"
    container_name       = "tfstates"
    key                  = "postgres/all.tfstate"
  }
}

provider "openstack" {
  version     = "1.19"
  cloud       = "openstack"
}

data "terraform_remote_state" "vault" {
  backend "azurerm" {
    storage_account_name = "tfstates"
    container_name       = "tfstates"
    key                  = "vault/all.tfstate"
  }
}

resource "openstack_networking_secgroup_v2" "postgres" {
  name        = "postgres"
  description = "Allow vault connection"
}

resource "openstack_networking_secgroup_rule_v2" "allow-vault" {
  direction         = "ingress"
  ethertype         = "IPv4"
  security_group_id = openstack_networking_secgroup_v2.postgres.id
  remote_ip_prefix  = data.terraform_remote_state.vault.cidr 
}

Personally, I prefer terraform_remote_state because it feels less ambiguous and more declarative from a module perspective (i.e., you consciously declare output variables that should be used by other workspaces). However, I’m interested if there are solid reasons against it or if there are some best practices I’m not aware of.

Is there an officially recommended way for scenarios like that?

2 Likes

Hi @fishi0x01!

The approach we’ve generally recommended as the ideal case is actually a third option 3: explicitly write results into a configuration store or data store and then have other configurations read it.

Doing that requires having a suitable configuration store deployed, though. The requirement is to have some place to store data that Terraform can both write to and read from. I’m not familiar with OpenStack so I’m not sure if it has a suitable store, but some examples of such a store in other systems are AWS SSM Parameter Store, and HashiCorp Consul. I’m going to use AWS SSM Parameter Store here just for the sake of example.

In the “producer” configuration, we can use the aws_ssm_parameter resource type to explicitly publish a specific value to be consumed elsewhere.

resource "aws_ssm_parameter" "example" {
  name  = "vpc_id"
  type  = "String"
  value = aws_vpc.example.id
}

Then in other “consumer” configurations, we can use the corresponding data source to retrieve that value:

data "aws_ssm_parameter" "example" {
  name  = "vpc_id"
}

# and then use data.aws_ssm_parameter.example.value
# somewhere else.

This approach has two nice characteristics:

  • The publishing of information is explicit, so the intent is clear that this is a value intended to be consumed elsewhere, vs. just an implementation detail.
  • The “consumer” configurations using this value are decoupled from the “producer” configuration because in principle that value could’ve been written by any Terraform configuration, or possibly not even by Terraform at all. If you change system architecture in future, you can potentially change how that value gets populated without requiring changes to the consumers, because the configuration/data store serves as an indirection.

The two options you listed here are alternatives to this ideal in situations where you don’t have access to a configuration store. Each of those choices meets one of the nice characteristics above, but cannot meet both:

  • Option 1 (directly retrieving objects from the target system) achieves decoupling, but it’s not explicit about which objects are intended for external consumption and which are not. In a system with support for publishing and querying by custom tags you can approach explicit publishing through a standard tagging scheme, but you then need to ensure that everything in your system follows the tagging scheme properly.
  • Option 2 (terraform_remote_state) achieves explicit publishing, but suffers from close coupling: if you want that value to be managed by some other Terraform configuration or by another system entirely in future then you’ll need to change the consumers. Conversely, it’ll be hard (though not impossible) to consume that value from any system other than Terraform itself; generic configuration stores can more easily serve non-Terraform clients too.

If this so-called “ideal” approach isn’t viable in your environment then there is no single answer to which of the other options will be “best” in all cases; instead, you’ll need to make a tradeoff based on which of these two characteristics is more important in your situation. I hope that the above at least helps you consider the implications of each and come to a final decision!

2 Likes

Thanks for the great answer @apparentlymart!

I added the gist of your answer to the original stackoverflow question for the sake of completion - I hope you don’t mind.

A post was split to a new topic: Orchestrate multiple updates across a decomposed system