With the following configuration, I expected terraform would only replace a specific instance of a resource, but it seems to want to force replacement of all instances. Is there a method to only replace an instance with a matching key?
resource "terraform_data" "replace_on_node_count_change" {
input = toset([ for key, value in local.worker_nodepool_configs : value.node_count ])
}
resource "azurerm_kubernetes_cluster_node_pool" "shared_worker_vmss" {
for_each = toset(var.worker_nodepool_names)
lifecycle {
replace_triggered_by = [terraform_data.replace_on_node_count_change.input[each.key]]
}
}
I’ve also tried other combinations including using triggers_replace
and for_each
in the terraform_data
resource. I believe I’ve actually tried every combination conceivable to make a single instance of the node pool be replaced but everything I tried results in either all of the node pools being planned for replacement, or a single node pool getting updated in-place instead of being replaced.
It appears that when the terraform_data
resource is updated, if I use for_each
there, it updates all outputs and IDs even if only a single input value has changed. When I don’t use for_each
and provide a map to input
or triggers_replace
, then reference the specific map entry in replace_triggered_by
, the node pools still all get replaced.
Here’s another example, along with output.
Code:
resource "terraform_data" "replace_on_node_count_change" {
for_each = tomap({ for key, value in local.worker_nodepool_configs : key => value.node_count })
triggers_replace = each.value
}
Output:
# terraform_data.replace_on_node_count_change["bluez1"] must be replaced
-/+ resource "terraform_data" "replace_on_node_count_change" {
~ id = "a482cac0-f8ca-7ba1-bc2a-b43218c026f6" -> (known after apply)
+ triggers_replace = 2
}
# terraform_data.replace_on_node_count_change["bluez2"] will be updated in-place
~ resource "terraform_data" "replace_on_node_count_change" {
~ id = "a58e1b99-55f3-e12d-b5ef-f1d2295612ba" -> (known after apply)
}
The node count input for key bluez1 was changed from 1 to 2, so why does bluez2 terraform_data resource instance get updated in-place here? Its triggers_replace value didn’t change, and it doesn’t have an input value at all which would have caused its ID to change.
The goal I’m trying to accomplish is a little complex to explain so I’ll do my best. I’m operating in Azure working on an AKS cluster in an environment where auto-scaling is currently disabled and we need to enable it. We have 4 node pools currently with one node in each (2 pools named green and blue for each of 2 zones; so greenzone1, greenzone2, bluezone1, and bluezone2) and we may add additional node pools in the future to accommodate larger nodes or other requirements, which may or may not have auto-scaling enabled.
I also made some additional changes to the disk configuration of the above node pools which caused all 4 node pools to be replaced at the same time and took down all of the workloads in the process. Fortunately this was in a sandbox environment and over a weekend, so no harm no foul, but this is something I want to avoid doing in production. So I reworked the configuration to allow for configuring all of the node pools the same, and then having a variable which can provide override configs for each node pool. This would allow me to perform the disk changes on one node pool at a time. This part works beautifully.
Where I run into an issue now is with auto-scaling itself. Azure recommends when enabling auto-scaling to ignore the node_count
field. Since my node pools are configured by a for_each
, the node_count
field gets ignored on all pools. No big deal unless we are wanting to scale a node pool manually, or change the auto-scaling field from enabled to disabled.
When disabling auto-scaling, the node_count
field becomes required, but when the resource already exists, the provider is doing an in-place update to disable auto-scaling, and since node_count
is ignored, terraform throws an error that the node_count
field is required. Because of this, I had thought to use replace_triggered_by
along with terraform_data
in the manner shown in my OP and previous comments in order to force replacement of a single node pool at a time when either changing the node_count field in the configuration, or when disabling auto-scaling.
Unfortunately, it seems like terraform wants to update all of the instances of the terraform_data
resource whenever a change happens only to one instance, or one value within a map input on a single instance of the resource, which in turn causes all 4 node pools to also be replaced at the same time again, regardless of how I configure terraform_data
or the replace_triggered_by
field. This is definitely unexpected behavior on the part of the terraform_data
resource, though the behavior on the node pool resource makes sense given the updates I’m seeing with terraform_data
. I’m wondering if this is a bug that I should report or if this is working as-designed, and how I can accomplish what I’m trying to do.
I managed to come up with a workaround by abusing the terraform_data
resource and having a duplicate definition of the node pool resource. It’s not pretty but it works for the moment.
resource "terraform_data" "auto_scaling_vmss" {
input = azurerm_kubernetes_cluster_node_pool.auto_scaling_shared_worker_vmss
}
## The worker node pool configuration for node pools with auto-scaling
resource "azurerm_kubernetes_cluster_node_pool" "auto_scaling_shared_worker_vmss" {
for_each = toset([ for pool_name, pool_config in local.worker_nodepool_configs : pool_name if pool_config.enable_auto_scaling ])
...
}
## The worker node pool configuration for node pools without auto-scaling
resource "azurerm_kubernetes_cluster_node_pool" "no_auto_scaling_shared_worker_vmss" {
for_each = toset([
for pool_name, pool_config in local.worker_nodepool_configs : pool_name if (
!pool_config.enable_auto_scaling &&
try(terraform_data.auto_scaling_vmss[pool_name], null) == null
)
])
...
}
This could be done without the terraform_data
resource, however the provider will try to do the destroy and the create simultaneously, which doesn’t work because the creation errors out since the resource already exists until the destroy completes.
The terraform_data
resource references the auto-scaling VMSS which creates an implicit dependency, and thus it does not get updated until the destruction is complete. The for_each
in the non-auto-scaling VMSS references the terraform_data
resource and thus the non-auto-scaling VMSS gets created only after the destruction of the auto-scaling node pool is complete.
I’ve also tested switching auto-scaling off with the above configuration, and found that there is a side-effect which terraform reports is a bug in the terraform_data
resource, but I’m not certain it is. The side-effect is that, the terraform_data
resource generates an inconsistent final plan after destroying a node pool with auto-scaling disabled, and so running terraform apply causes an error about the inconsistency. The destroy happens, then the terraform_data
resource tries to update but gets a mismatched output from the attributes of the instance of the node pool resource being destroyed and recreated in this way. Simply running terraform apply again completes the job.
Hi @thomas.spear,
It’s hard to follow what’s going on here since you bounced around a bit, but it would be easier to see the problem if we go back and complete a minimal example like you started with. There’s no reason you can’t use replace_triggered_by
with an individual instance, but IIUC you probably want to setup a corresponding instance of terraform_data
to trigger the change. The way replace_triggered_by
works is that it needs to detect a change, but if all instances point to the same resource, they may all be seeing the same change.
Something like this (using another terraform_data
as a proxy for azurerm_kubernetes_cluster_node_pool
):
locals {
test = {
one = "first"
two = "second"
three = "third"
}
}
resource "terraform_data" "replace_on_node_count_change" {
for_each = local.test
input = each.value
}
resource "terraform_data" "shared_worker_vmss" {
for_each = local.test
lifecycle {
replace_triggered_by = [terraform_data.replace_on_node_count_change[each.key]]
}
}
Here changing any of the local.test
values will replace only the corresponding shared_worker_vmss
.
This is what I’ve described above, and it is not functioning properly for me.
Here’s a more recent attempt, including keys and values for locals (simplified from the actual logic for demonstration purposes):
locals {
worker_nodepool_names = ["bluez1". "greenz1", "bluez2", "greenz2"]
# Complex logic in worker_nodepool_configs has been simplified
worker_nodepool_configs = {
bluez1 = {
enable_auto_scaling = true
max_count = 15
min_count = 1
node_count = null
}
bluez2 = {
enable_auto_scaling = true
max_count = 15
min_count = 1
node_count = null
}
greenz1 = {
enable_auto_scaling = true
max_count = 15
min_count = 1
node_count = null
}
greenz2 = {
enable_auto_scaling = true
max_count = 15
min_count = 1
node_count = null
}
}
resource "terraform_data" "replace_on_node_count_change" {
for_each = tomap({ for pool_name, pool_config in local.worker_nodepool_configs : pool_name => pool_config.node_count })
input = each.value
}
resource "azurerm_kubernetes_cluster_node_pool" "shared_worker_vmss" {
for_each = toset(var.worker_nodepool_names)
lifecycle {
replace_triggered_by = [terraform_data.replace_on_node_count_change.input[each.key]]
}
}
I’ve included enable_auto_scaling as it is central to my requirement.
Primary points:
- If I have created the node pool with auto scaling enabled,
node_count
is set to null in state.
- Azure recommends ignoring changes to
node_count
when auto scaling is enabled, but requires it to be set if auto scaling is disabled.
- Therefore, if I disable auto scaling by setting
enable_auto_scaling
to false
, I must specify node_count
.
- If I’ve ignored changes to
node_count
and disable auto scaling for one node pool, node_count
remains null, which causes an error from the provider for that node pool because the provider is doing an in-place update.
- I’ve added the
terraform_data
resource as a sort of proxy as you mentioned, to work around this.
- After adding
terraform_data
, before making changes to the configuration, I ran terraform apply
to get the resource into the state.
- Then I set
node_count
for a single node pool to 1, and set enable_auto_scaling
to false
.
- When I change the two values (
enable_auto_scaling
to false
and node_count
to 1
) in the above code, let’s say for “bluez1”, all 4 node pools are getting planned for destroy and recreate when I run terraform plan, when I expected for only “bluez1” to be destroyed and recreated.
I see the following output in terraform plan
which doesn’t make sense and required the workaround I detailed in my most recent comment before this one.
# azurerm_kubernetes_cluster_node_pool.shared_worker_vmss["bluez1"] will be replaced due to changes in replace_triggered_by
-/+ resource "azurerm_kubernetes_cluster_node_pool" "shared_worker_vmss" {
...
~ enable_auto_scaling = true -> false
- max_count = 15 -> null
- min_count = 1 -> null
~ node_count = null -> 1
...
}
# azurerm_kubernetes_cluster_node_pool.shared_worker_vmss["bluez2"] will be replaced due to changes in replace_triggered_by
-/+ resource "azurerm_kubernetes_cluster_node_pool" "shared_worker_vmss" {
...
~ node_count = null -> (known after apply)
...
}
# azurerm_kubernetes_cluster_node_pool.shared_worker_vmss["greenz1"] will be replaced due to changes in replace_triggered_by
-/+ resource "azurerm_kubernetes_cluster_node_pool" "shared_worker_vmss" {
...
~ node_count = null -> (known after apply)
...
}
# azurerm_kubernetes_cluster_node_pool.shared_worker_vmss["greenz2"] will be replaced due to changes in replace_triggered_by
-/+ resource "azurerm_kubernetes_cluster_node_pool" "shared_worker_vmss" {
...
~ node_count = null -> (known after apply)
...
}
# terraform_data.replace_on_node_count_change["bluez1"] will be updated in-place
~ resource "terraform_data" "replace_on_node_count_change" {
id = "a482cac0-f8ca-7ba1-bc2a-b43218c026f6"
+ input = 2
+ output = (known after apply)
}
# terraform_data.replace_on_node_count_change["bluez2"] will be updated in-place
~ resource "terraform_data" "replace_on_node_count_change" {
id = "a58e1b99-55f3-e12d-b5ef-f1d2295612ba"
+ output = (known after apply)
}
# terraform_data.replace_on_node_count_change["greenz1"] will be updated in-place
~ resource "terraform_data" "replace_on_node_count_change" {
id = "587354d8-e5a6-674f-57f6-4e6e60eb40bb"
+ output = (known after apply)
}
# terraform_data.replace_on_node_count_change["greenz2"] will be updated in-place
~ resource "terraform_data" "replace_on_node_count_change" {
id = "5169b5a3-024c-860a-042d-c64c47f6b593"
+ output = (known after apply)
}
Plan: 4 to add, 4 to change, 4 to destroy.
As you can see by the outputs from terraform_data
, only one instance has its input actually changed, but all 4 instances are updated, and so replace_triggered_by
in the node pool resource is improperly forcing replacement of all 4 node pools. You’ll also note that the node_count
value for “bluez1” is going from null
to 1
whereas for the other 3 node pools is going to be (known after apply)
and additionally, only “bluez1” mentions switching enable_auto_scaling
to false, and removes the max_count
and min_count
values by setting them to null
whereas the others don’t – this is the same as how my output actually shows, I’ve only trimmed unnecessary/irrelevant lines. Those 3 fields are not getting updated in the other 3 node pools which is correct behavior as far as I am concerned, since I haven’t changed the values for those 3 node pools in the local variable.
If your’e prototyping out the syntax here, I would break out just the terraform_data
alone to figure out why you’re not able to change only a single instance at a time.
The reference here is also invalid:
terraform_data.replace_on_node_count_change.input[each.key]
But I assume that is a typo and should be:
terraform_data.replace_on_node_count_change[each.key]
As for the terraform_data
on its own, the change is quite subtle, and primarily due to your use of tomap
in the for_each
expression. Because that map for-expression initially has all null
values with no type information, Terraform cannot determine the map type so ends up with map(any)
. However when you change a single node_count
to a number, then the entire map type must be changed to accommodate that, which means the next plan will use map(number)
.
I would not have used tomap
here since it adds no value and can cause hard to diagnose problems, but if the map type is of use to you in a more complex use case, it could be more fully defined with
for_each = tomap({ for pool_name, pool_config in local.worker_nodepool_configs : pool_name => tonumber(pool_config.node_count) })
But either the above map, or more simply using the raw object will allow only a single instance to be changed at a time
resource "terraform_data" "replace_on_node_count_change" {
for_each = local.worker_nodepool_configs
input = each.value.node_count
}
Hi, thanks.
If your’e prototyping out the syntax here, I would break out just the terraform_data alone to figure out why you’re not able to change only a single instance at a time.
So, that’s where my second comment came in. I switched to triggers_replace
instead of input
in the below code, but I did also try it with input.
Code:
resource "terraform_data" "replace_on_node_count_change" {
for_each = tomap({ for key, value in local.worker_nodepool_configs : key => value.node_count })
triggers_replace = each.value
}
Output:
# terraform_data.replace_on_node_count_change["bluez1"] must be replaced
-/+ resource "terraform_data" "replace_on_node_count_change" {
~ id = "a482cac0-f8ca-7ba1-bc2a-b43218c026f6" -> (known after apply)
+ triggers_replace = 2
}
# terraform_data.replace_on_node_count_change["bluez2"] will be updated in-place
~ resource "terraform_data" "replace_on_node_count_change" {
~ id = "a58e1b99-55f3-e12d-b5ef-f1d2295612ba" -> (known after apply)
}
The reference here is also invalid:
terraform_data.replace_on_node_count_change.input[each.key]
Really? It seems to “work” (doesn’t throw any errors when I don’t have for_each
in the terraform_data
resource and instead use a { for key, value in ... }
on the input
field), as does referencing triggers_replace
and so does referencing the id
field on this resource. I’ll try without referencing any of them. Though all of it may be moot because I think you’re onto something with the next statement.
As for the terraform_data
on its own, the change is quite subtle, and primarily due to your use of tomap
in the for_each
expression. Because that map for-expression initially has all null
values with no type information, Terraform cannot determine the map type so ends up with map(any)
. However when you change a single node_count
to a number, then the entire map type must be changed to accommodate that, which means the next plan will use map(number)
.
I would not have used tomap here since it adds no value and can cause hard to diagnose problems, but if the map type is of use to you in a more complex use case, it could be more fully defined with
Question, for_each
only works with sets and maps. If I don’t use tomap
what would the for_each
line look like? The for_each
, pool_name
and pool_config.node_count
are necessary, but if it doesn’t need tomap
, how would this (your line pasted below) look?
for_each = tomap({ for pool_name, pool_config in local.worker_nodepool_configs : pool_name => tonumber(pool_config.node_count) })
Thanks for your insights! I do think adding tonumber()
around pool_config.node_count
will ultimately solve the issue because it’ll wrap null
with tonumber()
and keep the type fed to for_each
consistent regardless of the value held by pool_config.node_count
.
That would be a valid reference when the resource does not have for_each
, but your last example did use for_each
. You also wouldn’t want to be indexing the input
attribute; you’re only looking for a change in the instance, so referencing only the instance will simplify things. The intent of replace_triggered_by
is to couple the lifecycles of two resources, so while it’s valid to use more specific attributes of a resource, it’s often just overcomplicating things.
There does seem to be some documentation that specifies a map type as an argument to for_each
, but an object is valid there as well, and can often be more predictable as seen here.
Interesting! I’ve always assumed objects needed to be converted to maps for for_each
based on that documentation you mentioned. I’ll give both ways a try.
1 Like
After testing both scenarios detailed below, I can confirm both work. The second way requires updating the state to convert the terraform_data
inputs to objects before it properly functions, but that makes sense and I can comment out replace_triggered_by
in the node pools temporarily to get that update made. Thank you for the help!
tomap
with tonumber
:
for_each = tomap({ for pool_name, pool_config in local.worker_nodepool_configs : pool_name => tonumber(pool_config.node_count) })
No tomap
and no tonumber
:
for_each = { for pool_name, pool_config in local.worker_nodepool_configs : pool_name => pool_config.node_count }