Trying to understand why why terraform destroys lots of IAM resources because members are provided by a data source

We have a terraform configuration that deploys resources to GCP so we are using the GCP provider. This terraform configuration creates GCP projects and within those projects it creates many Identity & Access management (IAM) resources such as role grants.

We’ve witnessed some strange behaviour that we can’t wrap our heads around, it may take me a moment to explain.

Here’s the terraform code that creates the projects

locals {
  # the logic to derive the project_ids is actually a lot more complicated than this
  # but for demo purposes it doesn't really matter
  project_ids = toset([project1, project2, project3])
}
resource "google_project" "project" {
  # what's important here is that this resource is using a for_each
  for_each            = local.project_ids
  project_id          = each.key
  name                = each.key
  folder_id           = local.data_science_folder_id
  auto_create_network = false
  billing_account     = local.billing_account_id
}

We then have terraform code which deploys IAM resources within those project. e.g.:

resource "google_service_account_iam_member" "deployer_service_account_token_creator_test" {
  for_each           = local.deployer_test_access_users
  service_account_id = data.google_service_account.deployer_test.name
  role               = "roles/iam.serviceAccountTokenCreator"
  member             = each.value
}

Notice that it refers to a google_service_account data source. We think that is significant.

Which causes these changes at apply time

  # module.admin-data-eng.google_service_account_iam_member.deployer_service_account_token_creator_test["user:name@example.com"] must be replaced
-/+ resource "google_service_account_iam_member" "deployer_service_account_token_creator_test" {
      ~ etag               = "BwXblGITbpw=" -> (known after apply)
      ~ id                 = "projects/project3/serviceAccounts/deployer-test@project1.iam.gserviceaccount.com/roles/iam.serviceAccountTokenCreator/user:name@example.com" -> (known after apply)
      ~ service_account_id = "projects/project3/serviceAccounts/deployer-test@project1.iam.gserviceaccount.com" -> (known after apply) # forces replacement
        # (2 unchanged attributes hidden)
    }

This is irritating because terraform is destroying and recreating resources that don’t need to be destroyed and recreated. Even worse, by replacing IAM resources there will be a very very short period of time where the IAM grant is not in place and thus there is the potential for something failing in production while that IAM grant is being created (this hasn’t happened yet but we assume it is a matter of time).

What I’m failing to understand is why terraform is determining that the service_account_id is being changed. It isn’t being changed, that is a service account that already exists so why is terraform determining that the value will be (known after apply)?

Some other significant information:

  • This only seems to occur when we add new projects to the local.project_ids (i.e. when we’re creating new projects). Any other time we apply this configuration terraform does not determine that these objects need to be replaced
  • There are many many IAM resources in the configuration for which this is happening. I’ve picked out one as an example.
  • To reiterate if it isn’t clear from above… this only seems to happen when resources depend upon a google_service_account data source.

If anyone can shed any light on why this is happening we’d be very grateful because we are nonplussed.

Hi @jamiekt,

From the information you shared I can’t be sure but the behavior you’ve described sounds like what happens when a data resource configuration itself includes information that Terraform can’t know until after apply, and so Terraform is forced to wait until the apply step to actually read from the data resource. In that case, all of the result attributes of the data resource will be (known after apply) during planning because Terraform hasn’t actually called into it yet.

If that’s true then you should see another part of the plan describing the deferred read from the data resource, which should look something like this:

  # module.admin-data-eng.data.google_service_account.deployer_test.name will be read during apply
  # (config refers to values not yet known)
 <= data "google_service_account.deployer_test" "name" {
      (a description of the configuration)
    }

Do you see something like that in the plan? If so, perhaps you can share the configuration of that data resource and the corresponding part of the plan output and then I can hopefully explain why the data resource can’t read until the apply step and suggest what you might do differently to avoid that.

1 Like

Hi @apparentlymart ,
Thanks for (yet another) reply (I’m indebted to the help you’ve provided to me down the years).
Yes you’re right, I do see output like that in the plan:

 # module.admin-data-eng.data.google_service_account.deployer_test will be read during apply
  # (config refers to values not yet known)
 <= data "google_service_account" "deployer_test"  {
      ~ display_name = "Deployer test" -> (known after apply)
      ~ email        = "deployer-test@project3.iam.gserviceaccount.com" -> (known after apply)
      ~ id           = "projects/project3/serviceAccounts/deployer-test@project3.iam.gserviceaccount.com" -> (known after apply)
      ~ name         = "projects/project3/serviceAccounts/deployer-test@project3.iam.gserviceaccount.com" -> (known after apply)
      ~ unique_id    = "115191407279402577563" -> (known after apply)
        # (2 unchanged attributes hidden)
    }

As you can see, this resource is defined within a module.

Here is the configuration of that data (re)source:

data "google_service_account" "deployer_test" {
  account_id = "deployer-test"
  project    = var.project_id
}

variable project_id is a module variable. Here is the module configuration:

variable "admin_project" {
  /*this is in the root module*/
  type        = string
  description = "Deployment service accounts reside in this GCP project"
  default     = "project3"
}

module "admin-data-eng" {
  source       = "./modules/project3"
  region       = var.region
  all_projects = [for project in google_project.project : project.project_id] /*<-- Note this also*/
  data_folder  = local.data_folder_id
  location     = var.location
  labels       = local.default_labels
  project_id   = var.admin_project /* <-- here is where a value is passed to project_id */
  email_domain = var.email_domain
  environments = local.environments
  assets       = var.assets
  depends_on = [
    module.service_api_enablement
  ]
}

We don’t override the value var.admin_project in the root module, it uses the default, hence (I think) it is known at plan time.

I think the var.all_projects is significant here because its value is derived from google_project.project which is the resource that creates projects and (as I said above) this problem only occurs when we are creating new projects.
This admin-data-eng module contains this configuration:

resource "google_project_iam_member" "deployer_role_admin" {
  for_each = var.all_projects
  role     = "roles/iam.roleAdmin"
  member   = "serviceAccount:${google_service_account.deployer.email}"
  project  = each.value
}

As you can see it refers to var.all_projects so ultimately when we create new projects new instances of google_project_iam_member.deployer_role_admin are getting created. However I don’t understand why the creation of new projects causes the read of data.google_service_account.deployer_test to be deferred.

Thanks for your help so far Mart. It helped me get closer to the root of the problem however I’m still not understanding the root cause of the problem.

Hi @jamiekt,

The source of the problem is probably the use of depends_on in the module call. What that declares is that every object within the “admin-data-eng” module depends on every change of every object object within the “service_api_enablement” module. Hence any changes within the “service_api_enablement” module will prevent data.google_service_account from being read during the plan.

If that depends_on was added for a specific reason, you will be better off using an output from the service_api_enablement module as an input to your module to setup the needed dependency.

Hi @jbardin ,
many thanks for the reply.

If that depends_on was added for a specific reason, you will be better off using an output from the service_api_enablement module as an input to your module to setup the needed dependency.

Yes its there for a reason. That module basically enables all the GCP services that we’re going to use in our projects, we can’t create resources until the services are enabled, hence the depends_on. I hate using depends_on but in this case I don’t see an alternative because there’s nothing in that service_api_enablement module that we can “depend upon”. It just enables services using many google_project_service resources, there are no attributes of those resources that we actually need to refer to.

Here is the code that instantiates that module. As you can see it refers to local.projects and refers to google_project.project for its for_each

module "service_api_enablement" {
  # A neat trick to see the values used herein is to issue (obviously change the name of the project to what you're interested in):
  #  dp-admin sh._  # Launches you into the dataplatform-admin container
  #  cd global-vars
  #  echo '[for p in local.projects: p.enable_services if p.project_id == "msm-groupdata-cdm-dev"]' | terraform console
  #  cd -

  # What's happening here for the for_each argument is for_each chaining.
  # Read more at https://www.terraform.io/docs/language/meta-arguments/for_each.html#chaining-for_each-between-resources
  for_each                    = google_project.project
  source                      = "./modules/ServiceAPIEnablement"
  project_id                  = each.value.project_id
  region                      = var.region
  enable_cloudfunctions       = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.cloudfunctions
  enable_secretmanager        = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.secretmanager
  enable_cloudbuild           = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.cloudbuild
  enable_composer             = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.composer
  enable_containerregistry    = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.containerregistry
  enable_artifactregistry     = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.artifactregistry
  enable_bigquerydatatransfer = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.bigquerydatatransfer
  enable_cloudscheduler       = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.cloudscheduler
  enable_workflows            = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.workflows
  enable_dataflow             = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.dataflow
  enable_compute              = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.compute
  enable_aiplatform           = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.aiplatform
  enable_cloudasset           = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.cloudasset
  enable_appengine            = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.appengine
  enable_vpcaccess            = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.vpcaccess
  enable_run                  = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.run
  enable_accesscontextmanager = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.accesscontextmanager
  enable_bigquery             = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.bigquery
  enable_logging              = [for p in local.projects : p if p.project_id == each.value.project_id][0].enable_services.logging
}

I like your suggestion " you will be better off using an output from the service_api_enablement module as an input to your module to setup the needed dependency" but given there are no such values that we could legitimately refer to via an output I’m a bit stumped as to what to do.

If there are no direct relationships between the modules, except that one must exist before the other, then often it’s a sign that the modules should be managed by separate configurations. We don’t have a good workflow for managing this entirely within Terraform at the moment, but it’s something we’re researching.

In the meantime if breaking up the configuration is not a possibility, you could create some artificial dependencies to pass through. Take an output derived from the final resources in one module, and input them into the second module only to be stored in something like a null_resource. You can then add depends_on only to the resources which would require it within the module pointing to that null_resource.

Breaking the configuration up into multiple configurations would be (for reasons I won’t go into here) a mammoth task that we don’t want to do right now. I like the null_resource idea though so I’ll give that a go.

A similar idea that works in many cases (but probably not all) is to declare an output value with a more “surgical” depends_on itself, even if its value is meaningless, and then pass that output value into the module with a name that indicates that it will be the dependencies only for some specific subset of the resources inside

Then in a resource block you can say something like depends_on = [ var.example ] to say “this depends on everything that this variable depends on”.

This is still using depends_on but doing so in a more precise way so you can control what the dependency applies to. In particular, you would avoid using depends_on with that data source to ensure that it can always read during the planning phase.

As previously discussed, splitting into multiple configurations is a more typical answer here but for this case in particular it should work to just explain the dependencies to Terraform more precisely. You technically don’t need to split this into two parts unless a for_each or count itself depends on the results of the first module.