Terraform (aws) applies in wrong order

I’ve provisioned a load balancer in AWS with Terraform. It worked fine.

Now, I wanted to replace the resource name for the certificate used by the load balancer. “terraform apply” tries to do it, but in wrong order - as a result, “apply” fails after around 20 minutes. The order in which terraform executes certificate replacement:

  1. aws_acm_certificate - it generates a new certificate (it succeeds)
  2. aws_acm_certificate - it wants to remove the old certificate (removal fails, as it’s still used by the load balancer listener we’re trying to modify)
  3. aws_lb_listener - it wants to change the certificate used by the listener (not executed, because certificate removal above failed)

The right order should be:

  1. aws_acm_certificate - generate a new certificate
  2. aws_lb_listener - modify the certificate used by the listener (use the newly generated certificate)
  3. aws_acm_certificate - finally, remove the old/unused certificate

I know that it’s not possible to assure the order with Terraform - so, how can I assure the certificate is replaced correctly?

$ terraform -version
Terraform v1.4.0
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v4.54.0

Also tried with hashicorp/aws v4.57.1, but it behaves the same.

Resource definition for the certificate:

resource "aws_acm_certificate" "certificate" {
    domain_name               = var.certificate
    key_algorithm             = "RSA_2048"
    validation_method         = "DNS"

    options {
        certificate_transparency_logging_preference = "ENABLED"
    }

    lifecycle {
        create_before_destroy = true
    }

    tags = {
        ManagedBy = "Terraform"
    }
}

Hi @tchwpkgorg,

The order of operations in Terraform is strictly defined, and using create_before_destroy = true should give you the order you are looking for. Did you apply the config originally with create_before_destroy or did you try to add that afterwards?

I’ve added create_before_destroy afterwards, but it didn’t help.
Terraform did create a new certificate, on the first run, before trying to remove the old one even without create_before_destroy. The problem is it’s trying to remove the old certificate before the new one is added to the load balancer listener - the listener can’t be left without a certificate.

Adding it afterwards an immediately trying to replace the resource is not going to work, because the instance in state does not know that it needs to be destroyed using the order imposed by create_before_destroy=true. You must either apply the resource from the start with create_before_destroy=true or refresh the state of the existing resources before replacing them.

In the end, worked around with “terraform state mv”.

@tchwpkgorg can you please provide an example of your terraform state mv command that worked to fix this or elaborate a bit more on your solution? Thank you.

@jbardin I have had create_before_destroy=true on all of my certificates from the beginning and I’m still experiencing this issue of wrong order. Exactly as @tchwpkgorg mentions above, certificate is trying to delete before load balancer listeners use the new certificate (which was created just fine). Any assistance or idea would be greatly appreciated.

@jmeridth

Have you tried terraform apply targeting only the aws_lb_listener resource? Not saying it’s a good solution but it should work since the new cert has already been issued.

1 Like

It seems to me your case wasn’t about changing the certificate attributes like

domain_name = www.aaa.com
to
domain_name = www.bbb.com

but rather renaming the terraform resource from

resource aws_acm_certificate aaa { ... }
to
resource aws_acm_certificate bbb { ... }

then a moved{ ... } block should have worked (which is essentially what terraform state mv does) - source

That is exactly what I ended up doing. That allowed the switch to happen and then a full apply could work and the old certificate could delete since it was no longer used.

1 Like

I’d have to see an example with the logs to know for sure what’s happening, but if Terraform is configured correctly then the usual culprit is eventual consistency in the remote APIs. Updating an object via one API call is sometimes not immediately visible in the second, which can cause the deletion to fail. Providers try to account in common cases, by adding some polling to query for the changes, but sometimes it’s unexpected or not possible at all.