Why did terraform not recognise that affected resources have been affected?

We experienced a problem today whereby a resource got recreated however resources that granted permissions on that resource were not also recreated. I have a repro to explain this, although unfortunately it uses the Snowflake Provider for Terraform so you will require a Snowflake account in order to run it. Even if you don’t have a Snowflake account hopefully there’s enough here to understand the problem.

Below is my terraform code. It creates 3 resources:

  • A Snowflake warehouse
  • A Snowflake role
  • A Snowflake warehouse grant that grants the role a privilege upon the warehouse

There is a variable var.initially_suspended to change the value of the initially_suspended attribute of the warehouse

terraform {
  required_version = ">= 0.13"
  required_providers {
    snowflake = {
      source  = "snowflake-labs/snowflake"
      version = "0.82.0"
    }
  }
}

resource "snowflake_role" "role" {
  name    = "demo-role"
}

variable "initially_suspended" {
  type = bool
}

resource "snowflake_warehouse" "warehouse" {
  name                                = "demo-warehouse"
  initially_suspended                 = var.initially_suspended
  statement_queued_timeout_in_seconds = 600
}

resource "snowflake_warehouse_grant" "warehouse_usage_grant" {
  warehouse_name         = snowflake_warehouse.warehouse.name
  privilege              = "USAGE"
  roles                  = [snowflake_role.role.name]
}

I applied like so:

export SNOWFLAKE_REGION=eu-west-1
export SNOWFLAKE_ROLE=ACCOUNTADMIN
export SNOWFLAKE_USE_BROWSER_AUTH=true
export SNOWFLAKE_USER=name@myorg.com
export SNOWFLAKE_ACCOUNT=redacted

terraform apply --auto-approve --var initially_suspended=false                    

This successfully created the three resources:

Plan: 3 to add, 0 to change, 0 to destroy.
snowflake_role.role: Creating…
snowflake_warehouse.warehouse: Creating…
snowflake_role.role: Creation complete after 2s [id=demo-role]
snowflake_warehouse.warehouse: Creation complete after 2s [id=demo-warehouse]
snowflake_warehouse_grant.warehouse_usage_grant: Creating…
snowflake_warehouse_grant.warehouse_usage_grant: Creation complete after 0s [id=demo-warehouse|||USAGE|demo-role|false]

I now issue the same command except this time I change the value of the variable from false to true (which forces a replacement):

terraform apply --auto-approve --var initially_suspended=true

The warehouse was recreated, note however that the warehouse grant is not mentioned in the plan

Terraform will perform the following actions:

snowflake_warehouse.warehouse must be replaced
-/+ resource “snowflake_warehouse” “warehouse” {
~ auto_resume = true → (known after apply)
~ auto_suspend = 600 → (known after apply)
~ id = “demo-warehouse” → (known after apply)
~ initially_suspended = false → true # forces replacement
~ max_cluster_count = 1 → (known after apply)
~ min_cluster_count = 1 → (known after apply)
name = “demo-warehouse”
~ resource_monitor = “null” → (known after apply)
~ scaling_policy = “STANDARD” → (known after apply)
~ warehouse_size = “X-Small” → (known after apply)
# (4 unchanged attributes hidden)
}

Plan: 1 to add, 0 to change, 1 to destroy.
snowflake_warehouse.warehouse: Destroying… [id=demo-warehouse]
snowflake_warehouse.warehouse: Destruction complete after 2s
snowflake_warehouse.warehouse: Creating…
snowflake_warehouse.warehouse: Creation complete after 0s [id=demo-warehouse]

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

image

If I execute exactly the same command:

terraform apply --auto-approve --var initially_suspended=true

terraform recognises that the warehouse grant does not exist, and recreates it:

Terraform will perform the following actions:

snowflake_warehouse_grant.warehouse_usage_grant will be updated in-place
~ resource “snowflake_warehouse_grant” “warehouse_usage_grant” {
id = “demo-warehouse|||USAGE|demo-role|false”
~ roles = [
+ “demo-role”,
]
# (4 unchanged attributes hidden)
}

Plan: 0 to add, 1 to change, 0 to destroy.
snowflake_warehouse_grant.warehouse_usage_grant: Modifying… [id=demo-warehouse|||USAGE|demo-role|false]
snowflake_warehouse_grant.warehouse_usage_grant: Modifications complete after 2s [id=demo-warehouse|||USAGE|demo-role|false]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

My question is, why didn’t terraform recognise that the warehouse needed to be recreated on the second apply? Why did I need to run exactly the same command again in order for terraform to recreate the warehouse grant? It is my understanding that the implicit dependency from the warehouse grant on the recreated warehouse should cause terraform to reason that:

Oh, the warehouse has been recreated. The warehouse grant depends upon it so that should be recreated too.

The only explanation I can think of is that it is the responsibility of the provider to recognise that the warehouse grant needs to be recreated.

So, which is at fault here? Terraform, or the Snowflake Provider for terraform?

N.B. When this occurred for real today the missing warehouse grant went unnoticed for a few hours and hence caused a major outage.


Don’t forget to destroy everything after running the repro:

terraform destroy --auto-approve --var initially_suspended=true

Hi @jamiekt,

I think what happened here is that the provider planned replacement of snowflake_warehouse with the same name before and after, and so snowflake_warehouse.warehouse.name (assigned to warehouse_name in snowflake_warehouse_grant) didn’t change and therefore the provider considered the new desired state identical to the old desired state.

Of course, you have some additional information that Terraform did not: that apparently this API silently deletes all of the “warehouse grant” objects when a warehouse is deleted. Because Terraform isn’t aware of that hidden interaction, it understands the removal of the role from the warehouse grant as a change made outside of Terraform – by the remote API itself, in this case – and so tries to repair it on the next run as would be the case if you’d changed something manually in the system’s management UI.

In today’s Terraform you can inform Terraform about that hidden interaction like this:

resource "snowflake_warehouse_grant" "warehouse_usage_grant" {
  warehouse_name = snowflake_warehouse.warehouse.name
  privilege      = "USAGE"
  roles          = [snowflake_role.role.name]

  lifecycle {
    replace_triggered_by = [
      snowflake_warehouse.warehouse.name,
    ]
  }
}

In an ideal world, the provider would’ve automatically informed Terraform about that relationship between these two objects so that you would not need to write anything. However, the Terraform provider protocol currently lacks any way to describe such relationships. That gap is what this design issue is about:

That issue discusses a way to allow providers to “talk about” related objects when planning changes for a specific object, which would then in principle allow the provider to express something like “deleting object A implicitly deletes object B”, which would then in turn allow Terraform to infer the extra actions required to deal with the implicit change, so that you’d no longer need to explicitly configure replace_triggered_by.

However, that issue is really just a problem statement with only very early ideas on how to solve it. I have some further work on this in a non-public place where I collected some more specific examples from discussions with provider developers, and it does still seem like a promising direction, so hopefully eventually we’ll have time to do some more concrete design work for it, and then implement something in this area.

1 Like

Hello @apparentlymart ,
As ever a fantastic reply from you, thank you Martin. Its great to know that a workaround exists (i.e. replace_triggered_by), I never knew about that meta-argument and will be adding it to the module in which this problem occurred straightaway.

I will follow the issue with interest.