Updating resource state programmatically for sub-resources

I’ve run into an interesting dependency ordering issue that I’d love some feedback on. The example is pertinent to GCP CloudSQL but I suspect that this is a more general issue.

I have three resources, a CloudSQL DB Instance, a CloudSQL Database created inside that instance, and a CloudSQL User created inside the instance:

resource "random_id" "airflowdb_name_suffix" {
  byte_length = 4
  keepers = {
    // generate a new ID every time we switch regions
    region = var.region
  }
}
resource "google_sql_database_instance" "airflow" {
  name             = "airflowdb-${random_id.airflowdb_name_suffix.dec}"
  database_version = "POSTGRES_9_6"
  region           = var.region
  project          = var.project[var.env]

  settings {
    tier      = "db-g1-small"
    disk_size = 50
    backup_configuration {
      enabled = true
    }
    ip_configuration {
      ipv4_enabled    = "true"
      private_network = var.network[var.env]
    }
  }
}

resource "google_sql_database" "airflow" {
  name     = "airflow"
  instance = google_sql_database_instance.airflow.name
  project  = var.project[var.env]
}

resource "google_sql_user" "airflow" {
  name     = "airflow"
  project  = var.project[var.env]
  instance = google_sql_database_instance.airflow.name
  password = data.google_kms_secret.airflow_db_pw.plaintext
}

So far so good, and from a cold start terraform creates all three resources in the correct order. The problem comes if we make a change to var.region – by design this forces a re-creation first of the random_id and then of the google_sql_database_instance resources. Which is great, but… terraform sees the dependency between the instances and the other two resources and attempts to delete them first. And those delete actions might, depending on the current state of the database, fail: cloudsql won’t let you delete a user or a database if (at the db layer) there are active connections in flight from that user or to that database! And so the deletion of google_sql_database.airflow fails and the apply run fails with an error.

What I’m looking for is some way to express that google_sql_database is, properly, a sub-resource of google_sql_database_instance: if we destroy/recreate the instance resource, GCP will automatically delete the database and user resources for us, so there’s no need for terraform to care about deleting them if we’re deleting the instance. (And in fact, manually deleting them just adds delay before we get around to deleting the instance.) If an “apex resource” is marked for deletion in the provider, the sub-resources should simply be removed from terraform state if the deletion of the apex resource succeeds and then created again when the replacement resource is ready.

Another example of where this might be handy is in the current implementation of the Helm provider: right now if you destroy/recreate the kubernetes cluster (EKS/GKE/AKS) that a helm_release is installed into, you need to manually remove the helm releases from the state file or terraform will rebuild the cluster while believing that the helm resources in state are still present.

I don’t have a concrete answer for you, and it’s been a few years since I’ve used GCP specifically. However, if it were me, I would start investigating with ignore_changes and depends_on.

If it’s going to work at all, the solution is likely there somewhere.

1 Like

postscript: see some discussion with the maintainers of the gcp provider here

Bumping because… this is still an issue, and I’d like to keep it visible.

Also, I wanted to be explicit about the fact that while it was using the google cloud provider that led me to consider this issue, the issue is not specific to that provider and in fact it cannot be solved at the provider level, since terraform does not expose enough graph data to the provider to let it manage inter-resource dependencies on its own.

Likewise, this cannot currently be solved with either depends_on or ignore_changes – the current semantics of those directives do not allow specifying the most basic version of what we’d want here, which is to delete a subordinate resource from TF state without taking any API actions if the apex resource is deleted.