Using a data resource from another deployment

I know the title sounds simple, but here’s the complexity: I have two TF deployments, one of them has the following resource defined in it:

resource “azurerm_log_analytics_workspace” “shared” {
name = “${local.resource_prefix}-log”
location = azurerm_resource_group.shared.location
resource_group_name = azurerm_resource_group.shared.name
sku = “PerGB2018”
retention_in_days = 30
}

In my other deployment, which needs to use this resource, it would look like the following (i.e. if it was static information).

data “azurerm_log_analytics_workspace” “diagnostic” {
name = “static_name_of_log_workspace”
resource_group_name = “static_name_of_rg”
}

The problem/issue I am trying to solve is that I need to be able to use the above resource but I need to know the current resource name and resource group (i.e. as opposed to the static information for the name and resource group) since that information changes based on what environment we are deploying to (e.g. dev, test, prod).

Any assistance is greatly appreciated.

Karl

Hi @bingerk,

It sounds like your downstream configuration (the one with the data block) needs to be parameterized with an environment name, at least, in order to be able to represent the necessary variations between your environments.

The main building-block for that would be to declare a variable to represent the environment name, like this:

variable "environment" {
  type = string
}

The question then becomes how best to populate it. My preferred strategy is to put all of the objects whose configuration is the same across environments into a shared module and then create separate small root modules for each environment that only configure a suitable backend and call the module, like this:

# Configuration specific to the "prod' environment, for example

terraform {
  backend "azurerm" {
    key = "prod.terraform.tfstate"
    # and so on...
  }
}

module "main" {
  source = "../modules/main"

  environment = "prod"
}

The other way to do it is to have only a single root module and combine the -var-file argument to terraform apply with the -backend-config argument to terraform init to populate the environment-specific parts, but I prefer the above because it keeps everything together in one place and allows just running the normal terraform init and terraform apply commands with no special arguments.

Another pattern to consider is having a data-only module which takes an environment name and returns structured information about that environment, which can then avoid duplicating information about how to fetch the various bits of information across every caller and thus allow you to potentially change to get that information in other ways (like a dedicated configuration store) in future.

# (this example is in the "main" module
# I declared in the above example)

variable "environment" {
  type = string
}

module "env" {
  source = "../join-environment"

  environment = var.environment
}

data “azurerm_log_analytics_workspace” “diagnostic” {
  name                = module.env.log_analytics_workspace_name
  resource_group_name = module.env.resource_group_name
}

If access to the “log analytics workspace” is a trait common to most or all of your different configurations then you could potentially put the data source in the join-environment module and export the information it returns as outputs, but exactly what to include in the “environment” vs. what to include in the downstream configurations is an engineering tradeoff rather than a straightforward rule.

For the sake of keeping this example relatively similar to what you shared, here I assumed that the environment module just contains the construction rule to statically produce a log analytics workspace name given an environment name, leaving the calling module to actually fetch the data from that object if needed.

Thanks @apparentlymart I will admit what you shared is going to take some time to digest. But before I get too far, I wanted to clarify a couple things as I think it may change your response. The upstream config is what will change (e.g. dev_log_workspace dev_rg & prod_log_workspace and prod_rg). So in the downstream config I need to know how to query for whatever those resource names might be.

And just to clarify, these are two separate deployments/configurations, each having their own environments. For example, where the resource “azurerm_log_analytics_workspace” “shared” is deployed it has a dev and a prod environment and the downstream config/deployment (which is independent from the upstream config/deploy) has a dev, test, and prod environment.

In prose/story format what I am trying to do is: all the dev and test resources in the downstream config/deploy need to send their logs to the shared-dev-log-workspace and the prod resources in the downstream config/deploy need to send their logs to the shared-prod-workspace.

Does that change your response at all? In the meantime I’m going to read and re-read what you have suggested and make some attempts to test some of that out.

Thank you!

Hi @bingerk,

I think that matches what I expected. It seems like your object names are systematically generated from the environment names, like this:

"${var.environment}_rg"
"${var.environment}_log_workspace"

…and so your downstream modules also having access to var.environment should be sufficient for them to construct those same names. A data-only module could help to factor out those expressions into a reusable form, so that your other modules would just say something like module.env.resource_group instead of "${var.environment}_rg" every time.

Unfortunately, the Azure API design (unlike some others, e.g. AWS) doesn’t tend to offer ways to look up objects by various different criteria, so the data sources in the Azure provider are typically limited only to looking up an object given its name. That means that the azurerm_log_analytics_workspace data source can be useful if you want to, say, look up the portal_url for a workspace name you already know, it isn’t helpful if the name itself is what you don’t know.

For that reason, decomposed architectures on Azure will typically rely on systematic naming conventions to connect things together, whereas some other systems support other possibilities such as looking up items by specific tags (Environment = "prod", for example). A data-only module that takes an environment name as an input can, fortunately, centralize the details of how these names are determined so you don’t need to repeat the same name construction expression in many different places.

I must be missing something here @apparentlymart . How can/does the downstream environment “know” about the var.environment? These are two separate deployments each with their own state managed in a different workspace (but under the same organization) all within Terraform Cloud.

BTW: You mentioned “…downstream modules…” but to me that implies all under one state and deployment config.

The downstream configuration knows the environment value either because it’s hard-coded in an environment-specific root module, like I showed in my first comment, or because when you run Terraform you provide a value for that variable on the command line.

There isn’t any way to avoid explicitly saying which environment you are deploying to, because only you (the person running Terraform) know your intent. But at least in the case where you write a small root module with the settings hard-coded the correct values are recorded under version control, so someone working with that configuration needs only to cd into that directory and run terraform apply.

In case it helps put this into the bigger picture, a typical directory structure for the “one root module per environment” approach would be like this:

environments/
  prod/
    # contains the backend configuration and
    # the module "main" block with
    #     environment = "prod"
    main.tf
  test/
    # contains the backend configuration and
    # the module "main" block with
    #     environment = "test"
    main.tf
modules/
  main/
    # contains the resources that are common
    # to all environments
    main.tf

Both environments/prod and environments/test have module "main" blocks referring to ../modules/main, but they vary in what value of environment they set, and then whatever differences are important between environments must be decided in that shared module based on var.environment.

If you follow this structure for each of the decomposed components you want to manage separately with Terraform then they only need to agree on the environment names, and then (assuming you’ve constructed the object names consistently throughout, or used a separate shared data-only module to do centralize that) you’d apply the objects for a particular environment by switching into that directory and running Terraform in the usual way:

# (log in to Azure CLI, or get your credentials
# configured in some other suitable way.)
cd environments/prod
terraform init
terraform apply
1 Like

OK, I see where you were going with this now. I was viewing this through our current process (i.e. all setup in a release pipeline so things deploy to whatever environment is set in the terraform.workspace (e.g. dev, test or prod) and which branch the PR is made to). This logic here that you are suggesting would require us to make a bunch of changes for one minor process improvement. I’ll roll up my sleeves and play around with this but perhaps may just be cleaner to execute a script in the pipeline I’m thinking at this point.

I really do appreciate your perspective, feedback and assistance @apparentlymart!

If you are using terraform.workspace to represent environments then you can indeed use that in place of var.environment in my examples, and then use the workspace switching features to get a similar result with only a single configuration.

I didn’t recommend that here because in my experience it’s typically been a requirement for folks to use an entirely separate backend configuration for each environment, which workspaces are not designed to allow. There are more details on the relevant tradeoffs in When to use Multiple Workspaces; if you review those tradeoffs and decide that workspaces meet all of your needs then you can by all means use them!