I’ve run into an interesting dependency ordering issue that I’d love some feedback on. The example is pertinent to GCP CloudSQL but I suspect that this is a more general issue.
I have three resources, a CloudSQL DB Instance, a CloudSQL Database created inside that instance, and a CloudSQL User created inside the instance:
resource "random_id" "airflowdb_name_suffix" {
byte_length = 4
keepers = {
// generate a new ID every time we switch regions
region = var.region
}
}
resource "google_sql_database_instance" "airflow" {
name = "airflowdb-${random_id.airflowdb_name_suffix.dec}"
database_version = "POSTGRES_9_6"
region = var.region
project = var.project[var.env]
settings {
tier = "db-g1-small"
disk_size = 50
backup_configuration {
enabled = true
}
ip_configuration {
ipv4_enabled = "true"
private_network = var.network[var.env]
}
}
}
resource "google_sql_database" "airflow" {
name = "airflow"
instance = google_sql_database_instance.airflow.name
project = var.project[var.env]
}
resource "google_sql_user" "airflow" {
name = "airflow"
project = var.project[var.env]
instance = google_sql_database_instance.airflow.name
password = data.google_kms_secret.airflow_db_pw.plaintext
}
So far so good, and from a cold start terraform creates all three resources in the correct order. The problem comes if we make a change to var.region
– by design this forces a re-creation first of the random_id
and then of the google_sql_database_instance
resources. Which is great, but… terraform sees the dependency between the instances and the other two resources and attempts to delete them first. And those delete actions might, depending on the current state of the database, fail: cloudsql won’t let you delete a user or a database if (at the db layer) there are active connections in flight from that user or to that database! And so the deletion of google_sql_database.airflow
fails and the apply run fails with an error.
What I’m looking for is some way to express that google_sql_database
is, properly, a sub-resource of google_sql_database_instance
: if we destroy/recreate the instance resource, GCP will automatically delete the database and user resources for us, so there’s no need for terraform to care about deleting them if we’re deleting the instance. (And in fact, manually deleting them just adds delay before we get around to deleting the instance.) If an “apex resource” is marked for deletion in the provider, the sub-resources should simply be removed from terraform state if the deletion of the apex resource succeeds and then created again when the replacement resource is ready.
Another example of where this might be handy is in the current implementation of the Helm provider: right now if you destroy/recreate the kubernetes cluster (EKS/GKE/AKS) that a helm_release
is installed into, you need to manually remove the helm releases from the state file or terraform will rebuild the cluster while believing that the helm resources in state are still present.