Dependant resource recreated but the resource that uses it is not updated

grimm26 · April 9, 2024, 3:13pm

I have a module that creates an AWS Aurora cluster (aurora_module) and a module that creates a security group (sg_module). The security group id is an output of sg_module and that is passed as a parameter into the aurora_module. The aws_security_group resource in sg_module has the create_before_destroy lifecycle enabled.

I made a change to the sg_module that is causing the security group to be recreated. It creates the new aws_security_group before trying to destroy the old one. However, the aurora_module is not picking up that there is a change in the parameter that specifies the security group ID. Because of this, the apply fails with a timeout trying to delete the old security group that is still associated with the aurora cluster.

Does this need to be a multi step process?

Apply to create new security group.
Apply to update security group ID passed into aurora module and destroy the old security group

Or is there some method I am missing (besides triggering a recreation of the aurora cluster)?

jbardin · April 9, 2024, 4:59pm

Hi @grimm26,

The overall structure of what you’re describing should work within a single plan and apply, but without any details I can’t say where things might be going wrong.

When the aws_security_group is planned for replacement, does the aurora cluster show a corresponding change at all? Were the resources all applied with the create_before_destroy option before making this change?

apparentlymart · April 9, 2024, 5:45pm

EC2 won’t allow deleting a security group that’s associated with at least one network interface, and it sounds like the Aurora cluster’s network interfaces are associated with the security group.

Therefore the correct order of operations would need to be something like this:

create the new security group
update the Aurora cluster’s network interfaces to refer to the new security group
wait for EC2 to become consistent with the Aurora change
destroy the old security group

A common gotcha is what I’ve labelled as step 3 above: it takes some time after destroying or reconfiguring a network interface before the associated security group becomes unblocked for deletion, and (as far as I know) there is no way to know it’s ready except to keep trying until it succeeds. I think the AWS provider tries to deal with this by effectively incorporating step 3 into step 4, polling the “delete security group” API until it eventually succeeds. But if step 2 didn’t happen then it can never succeed, and so will poll until the operation eventually hits a timeout and returns an error.

With that said then, I think what @jbardin asked is the crucial point. Did Terraform mention the need to update the aurora cluster as part of the plan? That’s a different way of asking whether your apply phase is performing what I labelled as step 2 in the above list, since Terraform will not attempt to update the Aurora cluster during the apply phase unless it said it would during the planning phase.

If you’re not sure, then it might help to share the entire output of terraform plan showing the proposal to replace the security group, along with anything else that was proposed at the same time.

grimm26 · April 9, 2024, 6:46pm

No, the aurora cluster does not show a corresponding change at all in the plan. The aws_security_group resource has a create_before_destroy lifecycle on it. It did create a new SG before trying to delete the old one.

jbardin · April 9, 2024, 7:00pm

If there’s no change in the aurora cluster, then there is either a mistake in the configuration which isn’t directly linking the security group output to the cluster, or there is a bug in the resource which is ignoring the change in configuration. The fact that the security group id is changing because the security group is being replaced entirely should definitely show up as a change elsewhere in the configuration where that id is referenced (even if the order of operations was somehow incorrect).

grimm26 · April 9, 2024, 7:03pm

Example plan: aurora+sg plan · GitHub

Config inside module.aurora_cluster referencing the security group:

resource "aws_rds_cluster" "main_aurora_cluster" {                                                                                             
    provider = aws.this                                                                                                                          
                                                                                                                                                 
    # basic cluster data                                                                                                                         
    cluster_identifier = local.resource_name                                                                                                     
                                                                                                                                                 
    port           = var.port                                                                                                                    
    engine         = var.engine                                                                                                                  
    engine_version = var.engine_version                                                                                                          
    vpc_security_group_ids = compact(concat(                                                                                                     
      [module.aurora_sec_groups.enova_db_ingress_sg_id],                                                                                         
      var.additional_security_group_list,                                                                                                        
    ))                                                                           
....

module.aurora_sec_groups creates the security group for the aurora cluster.

jbardin · April 9, 2024, 7:52pm

thanks @grimm26,

You are hitting a bug in the aws_rds_cluster resource, which is not detecting a change in the vpc_security_group_ids, partly because the attribute is optionally computed, and partly because the legacy SDK cannot differentiate at that point between unknown and unset.

Normally it wouldn’t matter, but the reason the value is entirely unknown, is that compact cannot tell which elements will be duplicates until all unknown values are resolved, so the entire value becomes unknown. In this particular case you could leave out the compact call, so that the value sent to the provider is a set containing an unknown element rather than an unknown set. The fact that the data structure is a set will implicitly remove duplicate values.

grimm26 · April 9, 2024, 7:53pm

ah lemme try removing the compact().

grimm26 · April 9, 2024, 8:09pm

the compact() call was the culprit. Thanks!

system · June 10, 2024, 8:10pm

This topic was automatically closed 62 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Terraform update SGR duplicate AWS	0	1076	February 16, 2023
Forces replacement in module in security group AWS	0	1245	January 4, 2023
Each Time TF is executed aws_security_group is replace without a change. Why? AWS	1	565	March 16, 2023
Terraform wants to replace my AWS instance with no modifications AWS	3	8954	August 5, 2021
Terraform plan tries to destroy the security group resource while adding egress rules to existing security group Terraform	0	1071	November 7, 2023

Dependant resource recreated but the resource that uses it is not updated

Related topics