Terraform plan errors for known after apply values

Hi all,

I have a route53 module which has a record creating resource as part of it (of course):

resource "aws_route53_record" "record" {
  for_each = var.module_enabled ? local.records : {}

  zone_id         = each.value.zone_id
  type            = each.value.type
  name            = each.value.name
  allow_overwrite = each.value.allow_overwrite
  health_check_id = each.value.health_check_id
  set_identifier  = each.value.set_identifier

  # set default TTL when ttl missing and is not ALIAS record
  ttl = each.value.ttl == null && each.value.alias.name == null ? var.default_ttl : each.value.ttl

  [...]
}

where the local.records in the for_each loop looks like this:

  + local_records                     = {
      + "a-subdomain.domain.tld"             = {
          + alias           = {
              + evaluate_target_health = true
              + name                   = (known after apply)
              + zone_id                = (known after apply)
            }
        }
      + "cname-subdomain-api.domain.tld"     = {
          + alias           = {
              + evaluate_target_health = null
              + name                   = null
              + zone_id                = null
            }
        }
      
      [ more CNAMEs here ]

  + }

and the module call example:

module "dns_records" {
  source = "./modules/route53"
  count  = length(module.gla) == 1 ? 1 : 0

  zone_id = try(data.aws_route53_zone.selected.id, null)

  allow_overwrite = true

  records = flatten(concat([
    {
      name    = "${lower(var.vpc["tag"])}.${var.my_domain["name"]}"
      type    = "A"
      alias = {
        name    = module.gla[count.index].dns_name  #<== another module dependency
        zone_id = module.gla[count.index].zone_id   #<== another module dependency
        evaluate_target_health = true
      }
    },
    {
      name    = "${lower(var.vpc["tag"])}-api.${var.my_domain["name"]}"
      type    = "CNAME"
      ttl     = 60
      records = ["${lower(var.vpc["tag"])}.${var.my_domain["name"]}"]
    },
  ],
  [ for i, v in var.my_subdomain : [{
      name    = "${split("-", lower(var.vpc["tag"]))[0]}-${lower(v)}.${var.my_domain["name"]}"
      type    = "CNAME"
      ttl     = 60
      records = ["${lower(var.vpc["tag"])}.${var.my_domain["name"]}"]
    },
    {
      name    = "${split("-", lower(var.vpc["tag"]))[0]}-${lower(v)}-api.${var.my_domain["name"]}"
      type    = "CNAME"
      ttl     = 60
      records = ["${lower(var.vpc["tag"])}.${var.my_domain["name"]}"]
    }
  ]]
  ))
}

Now, this worked fine in 1.5.7 but not in latest 1.6 and 1.7 where I get the error:

╷
│ Error: Missing required argument
│ 
│   with module.dns_records[0].aws_route53_record.record["a-subdomain.domain.tld"],
│   on ./modules/route53/main.tf line 130, in resource "aws_route53_record" "record":
│  130:   ttl = each.value.ttl == null && each.value.alias.name == null ? var.default_ttl : each.value.ttl
│ 
│ "ttl": all of `records,ttl` must be specified
╵

during the plan run. The AWS provider version is 5.38 and 5.39.1 tried both and no difference. BTW it is failing with 4.67 too so doubt it is provider related.

Anyone has any idea why? @apparentlymart maybe you can explain best what has changed to cause this? I looked through the change log of 1.6 and 1.7 and nothing obvious stood up to me – actually I saw some improvements in terms how the unknown values are being handled in terraform-core but that should not cause this I guess?

Thanks in advance for any help/guidance.

Hi @igoratencompass,

Given your description, the changes in Terraform were probably from some improvements in HCL which could produce more accurate values when expressions contained unknown values. I’ve seen a few isolated cases where the more correct values unexpectedly changed behavior, and some from the configuration leveraging the bug to get some desired behavior.

I’m not able to figure out how the value may have changed from the given information however. Can you show a more complete version of the local_records value from the plan?

While I can’t explain why this was working on older versions of Terraform, it seems like this module’s logic requires that one of the following be true:

  • All records specify a ttl
  • The default_ttl variable is set.

The module block you shared doesn’t set default_ttl and the first record in the records argument doesn’t set ttl, so this could be valid only if the default_ttl variable had a non-null default value. Does the declaration of that variable include a non-null default?

Thanks for your reply. Yes of course the var is set in the module’s variables.tf:

variable "default_ttl" {
  description = "(Optional) The default TTL (Time to Live) in seconds that will be used for all records that support the ttl parameter. Will be overwritten by the records ttl parameter if set."
  type        = number
  default     = 3600
}

Hi @jbardin ,

Thanks for your and @apparentlymart 's replies really appreciated you guys looking into this.

I haven’t provided the rest of the local variable because the logic chokes on the very first one ALIAS record in the list. If I remove it from the module call records it all works just fine. But here is some more detailed output for the local_records var:

  + local_records                     = {
      + "a-subdomain.domain.tld"             = {
          + alias           = {
              + evaluate_target_health = true
              + name                   = (known after apply)
              + zone_id                = (known after apply)
            }
          + allow_overwrite = true
          + failover        = null
          + health_check_id = null
          + idx             = 0
          + name            = "<reducted>"
          + set_identifier  = null
          + ttl             = null
          + type            = "A"
          + weight          = null
          + zone_id         = "<reducted>"
        }
      + "cname-subdomain-api.domain.tld"     = {
          + alias           = {
              + evaluate_target_health = null
              + name                   = null
              + zone_id                = null
            }
          + allow_overwrite = true
          + failover        = null
          + health_check_id = null
          + idx             = 3
          + name            = "<reducted>"
          + set_identifier  = null
          + ttl             = 60
          + type            = "CNAME"
          + weight          = null
          + zone_id         = "<reducted>"
        }
      
      [ more CNAMEs here ]

  + }

which shows more details like ttl = null for the ALIAS record for example.

I was thinking maybe something is going wrong with the null expression here:

so I’ll try next to change this to try or lookup or can even and see if that makes any difference.

No difference the outcome is same :confused:

My impression is that the effect here is as if this logic each.value.alias.name == null used to evaluate to null when name is known after apply but not anymore.

I think I see what could be the problem, which is a combination of provider bug and HCL expression evaluation, however I cannot reproduce any different behavior on v1.5 so there must be something else going in as well. A more complete example or at least the plan output would be helpful. Here’s what may be some clues…

The expression
each.value.ttl == null && each.value.alias.name == null ? var.default_ttl : each.value.ttl
might appear to always return a value for ttl at first glance, however the addition of unknowns (currently) prevents it entirely. The first part of the expression using a logical && does not short circuit when each.value.ttl == null, and the fact that .name is unknown will make the entire expression unknown during the plan.

I’ve seen bugs in legacy providers where unknown was not not handled correctly and mistaken for unset during validation, causing erroneous errors like this. I thought they were mostly sorted out by now, though I don’t have another explanation for what the provider is returning.

You can test that by removing the each.value.alias.name == null check from the expression which will result in a known value during plan. Maybe that will help figure out what part of your configuration was making this work in v1.5 (again I still get the same result in v1.5 from what I can piece together from your example)

Interesting, if I do that I get exactly the same error :confused: Not sure what more can we do, is it possible to upload the module zip archive here (any size limit?) or maybe DM it to you @jbardin ?

I’m not sure what the upload limits are here, but most modules should be OK since they are small text files. GitHub gists are often used too. If you have a standalone reproduction it may be helpful, but it’s much harder when it relies on existing infrastructure or credentials which we do not have access to.

The validation error is coming from a combination of the ttl attribute and the records attribute, can you show how you assigning records? It may also help to see how local.records is transformed from var.records. I think the newer release might be able to determine when one of those things is definitely null when it could not previously (or maybe even fixed a case where it was erroneously reporting something null when it should have been unknown)

I’m still betting that the provider is mis-validating the config however, because it sees one of ttl or records with an unknown and one with null. This should not fail validation because that unknown could still be null once it is known, but that was a common mistake made by older providers.

My knowledge of the old SDK behaviors is rusty, but FWIW the provider seems to be using the RequiredWith schema argument to enforce this rule, and the tests for that do include a case that seems similar to the situation we’re discussing here:

However, I notice it’s only testing the negative case – when there’s another argument also required but not set – and not the positive case where all of the arguments have been set but one has been set to the unknown value placeholder.

All of this is built on the shaky foundations of the little snapshot of Terraform v0.11 that’s built in to the SDKv2 repository though, so this all hinges on how the shims are populating the legacy terraform.ResourceConfig and on how that type reacts to being asked to Get an unknown value. I wasn’t able to spot an obvious explanation by reviewing the code, but there’s a lot of old stuff here that we don’t understand very well anymore. I agree with @jbardin that it seems likely that something’s going wrong in the provider’s handling of this validation rule, but I don’t have a direct explanation to point at.

Thanks guys for your help and the time spent looking into this. I guess the best option is to log a bug with the aws provider regarding aws_route53_record resource.