Dependency violation in destroying subnets when scaling down AZs in AWS

Hi,

I have the below codes that creates new AWS network firewall endpoints in new AZ subnets whenever I wish to expand to a multi AZ setup.

hashicorp/aws version is 5.15.0

terraform.tfvars:

fw_subnets = ["a.b.c.d/24"]

main.tf:

locals {

azs                   = slice(data.aws_availability_zones.available.names, 0, 3)
}

resource "aws_subnet" "sub-fw" {
  count = length(var.fw_subnets) > 0 ? length(var.fw_subnets) : 0
  vpc_id     = "vpc-xxxxxxxx"
  cidr_block = var.fw_subnets[count.index]
  availability_zone = length(regexall("^[a-z]{2}-", element(local.azs, count.index))) > 0 ? element(local.azs, count.index) : null
}

resource "aws_networkfirewall_firewall" "firewall-01" {
  count = length(var.fw_subnets) > 0 ? 1 : 0
  name                = "firewall-01"
  firewall_policy_arn = "<some policy arn>"
  vpc_id              = "vpc-xxxxxxxx"
  delete_protection = false
  subnet_change_protection = false
  dynamic "subnet_mapping" {
    for_each = aws_subnet.sub-fw
    content {
    subnet_id = subnet_mapping.value.id
    }
  }
}

It works fine and creates the new subnet and updates the firewall with a new vpce in the new subnet when I add a new firewall subnet to the fw_subnets variable list.
e.g fw_subnets = ["a.b.c.d/24", "w.x.y.z/24"]

However, when I try to revert to using 1 subnet in 1 AZ by changing back to fw_subnets = ["a.b.c.d/24"], it will end up in a hung state unable to destroy the removed subnet.

E.g

aws_subnet.sub-fw[1]: Still destroying... [id=subnet-0cc397f93785ff7af, 1m30s elapsed]
aws_subnet.sub-fw[1]: Still destroying... [id=subnet-0cc397f93785ff7af, 1m40s elapsed]

It is due to the fact that the new Firewall vpce is still in the subnet and has yet to be removed and so the subnet cannot be removed. I should be expecting Terraform to remove the firewall vpce 1st before removing the subnet but it doesn’t seem to be happening.

My “terraform plan” clearly shows that the Firewall is planned to be updated to remove the vpce but somehow when I run “terraform apply”, it chooses to remove the subnet 1st.

e.g

aws_networkfirewall_firewall.firewall-01[0] will be updated in-place
  ~ resource "aws_networkfirewall_firewall" "firewall-01" {
      .
      .
      .
      - subnet_mapping {
          - ip_address_type = "IPV4" -> null
          - subnet_id       = "subnet-0cc397f93785ff7af" -> null
        }

Any clue why this is happening? Am I referencing the resources incorrectly such that Terrafrom is unable to properly resolve and execute the dependencies ? Thanks in advance.

Hi,

Anyone have any clue on this?

Hi @halphyr,

If the aws_subnet is registered by aws_networkfirewall_firewall, and can’t be destroyed until the aws_networkfirewall_firewall is updated with the new set of subnets, you must use create_before_destroy in the subnets. Even though you are not creating a new subnet in this example, the same situation would happen during replacement as well – the updates need to be planed and applied before the destroy operations.

Hi @jbardin,

Thanks for the assistance and explanation! Appreciate it!