"WaitForState exceeded refresh grace period" error in gitlab CI pipeline when creating aws_route53_record

So I am trying to get gitlab CI set up for my terraform projects. I am running in to an issue specifically when the terraform plan is happening in a pipeline.

The pipeline gets to refreshing an aws_routet53_record, stalls for an hour and then times out. The info logs (full trace down below) shows a “WaitForState exceeded refresh grace period” Error,

Some other information about the project:

  • Basically I am trying to provision a bastion node with a route53 domain attached to it which is in the the same public subnet as an EKS cluster.
  • I use temporary credentials generated through sts:AssumeRole in the pipeline
  • Running on local works perfectly fine (even with the temporary credentials)
  • I have tried multiple different aws providers and it’s the same issue
  • The iam policy of the temporary credentials allows “*” on all resources, so it is not a permissions issue.

The pipelines run in a custom docker container based off of hashicorp/terraform:1.1.5 and also includes the vault cli (for generating the temporary iam credentials)

When I run: terraform plan -out “planfile”

The last state that the pipeline attempts to refresh gives this in the info logs:

aws_route53_record.bastion_dns[0]: Refreshing state... [id=REDACTED]
2022-03-30T14:42:47.510Z [INFO]  provider.terraform-provider-aws_v3.62.0_x5: 2022/03/30 14:42:47 [DEBUG] aws_ami - adding block device mapping: map[device_name:/dev/xvda ebs:map[delete_on_termination:true encrypted:false iops:0 snapshot_id:snap-0d6140f3df5d53053 throughput:0 volume_size:20 volume_type:gp2] virtual_name:]: timestamp=2022-03-30T14:42:47.510Z
2022-03-30T14:42:47.512Z [WARN]  Provider "provider[\"registry.terraform.io/hashicorp/aws\"]" produced an unexpected new value for module.eks.data.aws_ami.eks_worker[0].
      - .id: was cty.StringVal("ami-REDACTED"), but now cty.StringVal("ami-REDACTED-2")
      - .arn: was cty.StringVal("arn:aws:ec2:ca-central-1::image/ami-REDACTED"), but now cty.StringVal("arn:aws:ec2:ca-central-1::image/ami-REDACTED-2")
      - .creation_date: was cty.StringVal("2022-03-09T18:11:12.000Z"), but now cty.StringVal("2022-03-17T21:41:34.000Z")
      - .block_device_mappings: planned set element cty.ObjectVal(map[string]cty.Value{"device_name":cty.StringVal("/dev/xvda"), "ebs":cty.MapVal(map[string]cty.Value{"delete_on_termination":cty.StringVal("true"), "encrypted":cty.StringVal("false"), "iops":cty.StringVal("0"), "snapshot_id":cty.StringVal("snap-01ef559588c37a43c"), "throughput":cty.StringVal("0"), "volume_size":cty.StringVal("20"), "volume_type":cty.StringVal("gp2")}), "no_device":cty.StringVal(""), "virtual_name":cty.StringVal("")}) does not correlate with any element in actual
      - .root_snapshot_id: was cty.StringVal("snap-01ef559588c37a43c"), but now cty.StringVal("snap-0d6140f3df5d53053")
      - .name: was cty.StringVal("amazon-eks-node-1.21-v20220309"), but now cty.StringVal("amazon-eks-node-1.21-v20220317")
      - .image_location: was cty.StringVal("amazon/amazon-eks-node-1.21-v20220309"), but now cty.StringVal("amazon/amazon-eks-node-1.21-v20220317")
      - .image_id: was cty.StringVal("ami-REDACTED"), but now cty.StringVal("ami-REDACTED-2")
2022-03-30T14:42:47.738Z [INFO]  provider.terraform-provider-aws_v3.62.0_x5: 2022/03/30 14:42:47 [DEBUG] Expanded record name: bastion.redacted.com: timestamp=2022-03-30T14:42:47.738Z
2022-03-30T14:42:47.738Z [INFO]  provider.terraform-provider-aws_v3.62.0_x5: 2022/03/30 14:42:47 [DEBUG] List resource records sets for zone: HOSTEDZONE, opts: {
  HostedZoneId: "HOSTEDZONE",
  MaxItems: "1",
  StartRecordName: "bastion.redacted.com.",
  StartRecordType: "A"
}: timestamp=2022-03-30T14:42:47.738Z
2022-03-30T14:42:47.983Z [INFO]  provider.terraform-provider-aws_v3.62.0_x5: 2022/03/30 14:42:47 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2022-03-30T14:42:47.983Z
2022-03-30T14:42:47.986Z [WARN]  Provider "registry.terraform.io/hashicorp/aws" produced an invalid plan for aws_route53_record.bastion_dns[0], but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .health_check_id: planned value cty.StringVal("") for a non-computed attribute
      - .set_identifier: planned value cty.StringVal("") for a non-computed attribute
2022-03-30T14:44:45.401Z [INFO]  provider.terraform-provider-aws_v3.62.0_x5: 2022/03/30 14:44:45 [WARN] WaitForState timeout after 2m0s: timestamp=2022-03-30T14:44:45.401Z
2022-03-30T14:44:45.401Z [INFO]  provider.terraform-provider-aws_v3.62.0_x5: 2022/03/30 14:44:45 [WARN] WaitForState starting 30s refresh grace period: timestamp=2022-03-30T14:44:45.401Z
2022-03-30T14:45:15.402Z [INFO]  provider.terraform-provider-aws_v3.62.0_x5: 2022/03/30 14:45:15 [ERROR] WaitForState exceeded refresh grace period: timestamp=2022-03-30T14:45:15.402Z

Any help/insights would be appreciated. I’ve been spinning my wheels on this for over a week.