Kubectl_manifests when sourcing multiple urls

Currently I have:

terraform {
  required_providers {
    kubectl = {
      source = "gavinbunney/kubectl"
    }
  }
}
locals {
  karpenter_manifests = toset([
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.k8s.aws_awsnodetemplates.yaml",
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml",
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.sh_machines.yaml",
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.sh_nodeclaims.yaml",
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.sh_nodepools.yaml",
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.sh_provisioners.yaml",
  ])
}
data "http" "karpenter_crd_raw" {
  for_each = local.karpenter_manifests
  url = each.key
}

data "kubectl_file_documents" "karpenter_crd_doc" {
  for_each = data.http.karpenter_crd_raw
  content  = each.value.response_body
}

resource "kubectl_manifest" "karpenter_crd" {
  for_each  = data.kubectl_file_documents.karpenter_crd_doc.manifests

  yaml_body = each.value
}

but that errors:

│ Because data.kubectl_file_documents.karpenter_crd_doc has "for_each" set, its attributes must be accessed on specific instances.
│ 
│ For example, to correlate with indices of a referring resource, use:
│     data.kubectl_file_documents.karpenter_crd_doc[each.key]

Hi @cdenneen,

I think the missing piece here is that you need an extra expression to move from what you currently have – a map of maps of YAML manifests, grouped by the URL they came from – to a flat map with one element per manifest, regardless of which URL they came from.

The expression data.kubectl_file_documents.karpenter_crd_doc produces a map because that resource uses for_each. Each element of that map is an object representing one instance of that resource, and each of those objects has a manifests attribute that seems to be a map of strings containing YAML manifests.

I know from some out-of-band discussion with you in a different place that the innermost map of manifests has keys that are based on the unique identifier of the object being described in the manifest, in which case they are presumably unique across the entire Kubernetes cluster and therefore safe to use alone as instance keys for the manifests, ignoring which of the source URLs each one originated from.

If that’s true then it should be possible to just discard the outer level of map and merge the inner maps together, like this:

locals {
  all_manifests = merge([
    for src in data.kubectl_file_documents.karpenter_crd_doc :
    src.manifests
  ]...)
}

The for expression inside this is producing a list of maps of YAML documents, still grouped by the URL they originated from but discarding the information about which URL each list belongs to.

The merge call then combines all of those maps together into a single map. If any of the keys were to collide then you’d lose some elements in this process, but as long as the keys are unique across all of the maps (that is, as long as there aren’t any conflicting descriptions of the same object at your source URLs) you should end up with a single map containing all of the manifests from all of the URLs.

That data structure should then be suitable for use in the for_each of kubectl_manifest.karpenter_crd, because it has one element per manifest you want to declare:

resource "kubectl_manifest" "karpenter_crd" {
  for_each  = local.all_manifests

  yaml_body = each.value
}

I wrote the expression out as a separate local value because that way it’s easier to use terraform console to inspect it and understand what it’s producing, but you can just write that expression inline in the for_each argument of the resource if you prefer, unless you expect to be using that same data structure in other parts of your module.

You should find also that these resource instances get instance keys that reflect which object in the Kubernetes cluster is being described by each document, which means that if you edit any of the information at the given URLs Terraform should be able to understand the difference between you editing an object that already exists vs. adding or removing objects from the set.

I hope that helps!

Didn’t work:

╷
│ Error: Error in function call
│ 
│   on main.tf line 11, in locals:
│   11:   all_manifests = merge([
│   12:     for src in data.kubectl_file_documents.karpenter_crd_doc : src.manifests
│   13:   ])
│     ├────────────────
│     │ while calling merge(maps...)
│     │ data.kubectl_file_documents.karpenter_crd_doc is object with 6 attributes
│ 
│ Call to function "merge" failed: arguments must be maps or objects, got "tuple".

OK it does if I do exactly as you said and use ... which I didn’t realize was actual function call:

Thanks for the help @apparentlymart

OK so my only caveat here is that if we supply 2 different URLs which supply the same CRD:

  # kubectl_manifest.karpenter_crd["/apis/apiextensions.k8s.io/v1/customresourcedefinitions/nodepools.karpenter.sh"] will be created

so would want to know if there are more than one of these supplied to the merge and rather than the later one in the list removing the older one we’d need an error for uniqueness.

Basically if we have long list of CRDs and while the apiversion can be same for both but the actual spec changes it could cause problems and we need to make sure the proper one did apply. So that /apis/apiextensions.k8s.io/v1/customresourcedefinitions/nodepools.karpenter.sh needs to be unique (and not unique by arbitrarily dropping additionals with the merge because it could be keeping the wrong one). (I don’t need the url so that doesn’t need to be part of the uniqueness it’s really just to make sure there aren’t more than one for the same crd in our list of urls which could cause wrong one to be applied.

The design of merge includes silently overwriting conflicting keys, since that function was originally intended for use-cases like taking a map of “default tags” and merging some more-specific tags over the top of them, and so indeed in what I suggested it would silently drop all but one of the manifests that describe the same object as far as these API paths are concerned.

A different way to get a similar effect would be to first construct a map of lists of manifests, grouping by these keys. You could achieve that using a for expression in “grouping mode” (another meaning of the ... token) instead of merge, like this:

locals {
  all_manifests = flatten([
    for src in data.kubectl_file_documents.karpenter_crd_doc : [
      for k, manifest in src.manifests : {
        key      = k
        manifest = manifest
      }
    ]
  ])

  manifests_by_key = {
    for obj in local.all_manifests :
    obj.key => object.manifest...
  }

  duplicate_manifests = toset([
    for k, manifests in local.manifests_by_key : k
    if len(manifests) > 1
  ])
}

local.manifests_by_key would therefore be a map of lists of strings, and if there are two manifests describing the same object then they’ll be grouped together in the same list. local.duplicate_manifests finds those which have more than one, creating a set of the keys.

You can then use a precondition to prevent any evaluation of the kubectl_manifest resource if there are any duplicates:

resource "kubectl_manifest" "karpenter_crd" {
  for_each  = local.manifests_by_key

  yaml_body = one(each.value)

  precondition {
    condition     = len(local.duplicate_manifests) == 0
    error_message = "Duplicate definitions:${formatlist("\n  - %q", local.duplicate_manifests)}"
  }
}

Notice that yaml_body is now one(each.value) rather than just each.value, which is a concise way to take the single element from a one-element collection. each.value is now a list of all of the manifests with the key each.key, but the precondition guarantees that all of them will have only one element.

@apparentlymart so I get this error now:

╷
│ Error: Reference to undeclared resource
│
│   on ../../modules/k8s-cluster/karpenter.tf line 42, in locals:
│   42:     obj.key => object.manifest...
│
│ A managed resource "object" "manifest" has not been declared in module.k8s-cluster.
╵

Here is the updated locals and resource blocks:

locals {
  karpenter_manifests = toset([
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.k8s.aws_awsnodetemplates.yaml",
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml",
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.sh_machines.yaml",
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.sh_nodeclaims.yaml",
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.sh_nodepools.yaml",
    "https://raw.githubusercontent.com/aws/karpenter/main/pkg/apis/crds/karpenter.sh_provisioners.yaml",
  ])
  # all_manifests = merge([
  #   for src in data.kubectl_file_documents.karpenter_crd_doc : src.manifests
  # ]...)
  all_manifests = flatten([
    for src in data.kubectl_file_documents.karpenter_crd_doc : [
      for k, manifest in src.manifests : {
        key      = k
        manifest = manifest
      }
    ]
  ])

  manifests_by_key = {
    for obj in local.all_manifests :
    obj.key => object.manifest...
  }

  duplicate_manifests = toset([
    for k, manifests in local.manifests_by_key : k
    if len(manifests) > 1
  ])
}
data "http" "karpenter_crd_raw" {
  for_each = var.create_eks ? local.karpenter_manifests : toset([])
  url = each.key
}

data "kubectl_file_documents" "karpenter_crd_doc" {
  for_each = var.create_eks ? data.http.karpenter_crd_raw : {}
  content  = each.value.response_body
}

resource "kubectl_manifest" "karpenter_crd" {
  for_each  = var.create_eks ? local.manifests_by_key : {}
  yaml_body = one(each.value)

  precondition {
    condition     = len(local.duplicate_manifests) == 0
    error_message = "Duplicate definitions:${formatlist("\n  - %q", local.duplicate_manifests)}"
  }

  depends_on = [
    module.eks[0].aws_eks_cluster,
    module.eks[0].kubernetes_config_map_v1_data
  ]
}

I changed object.manifests to obj.manifests as I think that was what it was supposed to be but looks like while it “work” the kubectl_manifest doesn’t allow for precondition:

╷
│ Error: Unsupported block type
│
│   on ../../modules/k8s-cluster/karpenter.tf line 64, in resource "kubectl_manifest" "karpenter_crd":
│   64:   precondition {
│
│ Blocks of type "precondition" are not expected here.
╵

Hi @cdenneen,

Indeed, sorry I seem to have made at least two errors in what I shared.

You’ve already found the first one: I wrote object instead of obj.

The second mistake is that precondition for resources belongs inside a lifecycle block, like this:

  lifecycle {
    precondition {
      # ...
    }
  }

Unfortunately I think when I wrote this I’d been recently thinking about preconditions in a different context and didn’t check properly what I wrote before I sent it.

Hopefully with both of those corrected this will do something useful!

There was also len() needed to be length()

Fixed the lifecycle (Thanks for that) but since there are no duplicates the precondition seems to fail:

│ Error: Invalid template interpolation value
│
│   on ../../modules/k8s-cluster/karpenter.tf line 67, in resource "kubectl_manifest" "karpenter_crd":
│   67:       error_message = "Duplicate definitions:${formatlist("\n  - %q", local.duplicate_manifests)}"
│     ├────────────────
│     │ local.duplicate_manifests is empty set of dynamic
│
│ Cannot include the given value in a string template: string required.
╵

Ugh, clearly I was half asleep when writing these examples! Sorry about that.

This error message is correct because formatlist returns a list, as the name implies. Therefore this ought to have been join("", formatlist(/*...*/)) instead of just formatlist, to gather all of the elements together into a single string for interpolation.

❯ terraform plan -var cluster_name=christest-1234
╷
│ Error: Not enough function arguments
│
│   on modules/k8s-cluster/karpenter.tf line 53, in resource "kubectl_manifest" "karpenter_crd":
│   53:       error_message = "Duplicate definitions:${join("", formatlist(/*...*/))}"
│     ├────────────────
│     │ while calling formatlist(format, args...)
│
│ Function "formatlist" expects at least 1 argument(s). Missing value for "format".
╵
❯ vi modules/k8s-cluster/karpenter.tf
❯ terraform plan -var cluster_name=christest-1234
╷
│ Error: Invalid expression
│
│   on modules/k8s-cluster/karpenter.tf line 53, in resource "kubectl_manifest" "karpenter_crd":
│   53:       error_message = "Duplicate definitions:${join("", formatlist(/*...*/, local.duplicate_manifests))}"
│
│ Expected the start of an expression, but found an invalid expression token.
╵

The 2 examples I tried were:

      error_message = "Duplicate definitions:${join("", formatlist(/*...*/))}"
      error_message = "Duplicate definitions:${join("", formatlist(/*...*/, local.duplicate_manifests))}"

After our earlier confusion about whether ... was a literal token or a placeholder I suppose I should’ve known better than to use /*...*/ as a placeholder here!

My intention for that placeholder was “everything you already had in the parentheses for the formatlist call”.

@apparentlymart So I needed to make some changes to make sure the CRDs pulled down match the installed version of karpenter however this caused an issue where the original karpenter_manifests was just a list and therefore was only known after apply.

So I converted the urls to a map karpenter_manifests_map and I was able to get past the data sources giving me an issue but now I’m running into the issue on the kubectl_manifest resource since manifests_by_key is setting key => k… so I tried to set as key => crd but since crd is coming from data.kubectl_file_documents.karpenter_crd_doc as well it’s obviously too late and causes same error.

Is there a way to use the map keys from karpenter_manifests_map which would match the crd from data.kubectl_file_documents.karpenter_crd_doc in order for the key to be known and avoid this error:

│ Error: Invalid for_each argument
│ 
│   on modules/k8s-cluster/karpenter.tf line 48, in resource "kubectl_manifest" "karpenter_crd":
│   48:   for_each  = var.create_eks && var.enable_karpenter ? local.manifests_by_key : {}
│     ├────────────────
│     │ local.manifests_by_key will be known only after apply
│     │ var.create_eks is true
│     │ var.enable_karpenter is true
│ 
│ The "for_each" map includes keys derived from resource attributes that
│ cannot be determined until apply, and so Terraform cannot determine the
│ full set of keys that will identify the instances of this resource.
│ 
│ When working with unknown values in for_each, it's better to define the map
│ keys statically in your configuration and place apply-time results only in
│ the map values.
│ 
│ Alternatively, you could use the -target planning option to first apply
│ only the resources that the for_each value depends on, and then apply a
│ second time to fully converge.

Here is the latest HCL:

locals {
  karpenter_version = module.eks_blueprints_addons[0].karpenter.app_version
  karpenter_manifests_map = {
    "crd" = {
      "awsnodetemplate" = "https://raw.githubusercontent.com/aws/karpenter/v${local.karpenter_version}/pkg/apis/crds/karpenter.k8s.aws_awsnodetemplates.yaml"
      "ec2nodeclasses" = "https://raw.githubusercontent.com/aws/karpenter/v${local.karpenter_version}/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml"
      "machines" = "https://raw.githubusercontent.com/aws/karpenter/v${local.karpenter_version}/pkg/apis/crds/karpenter.sh_machines.yaml"
      "nodeclaims" = "https://raw.githubusercontent.com/aws/karpenter/v${local.karpenter_version}/pkg/apis/crds/karpenter.sh_nodeclaims.yaml"
      "nodepools" = "https://raw.githubusercontent.com/aws/karpenter/v${local.karpenter_version}/pkg/apis/crds/karpenter.sh_nodepools.yaml"
      "provisioners" = "https://raw.githubusercontent.com/aws/karpenter/v${local.karpenter_version}/pkg/apis/crds/karpenter.sh_provisioners.yaml"
    }
  }

  all_manifests = flatten([
    for crd, src in data.kubectl_file_documents.karpenter_crd_doc : [
      for k, manifest in src.manifests : {
        key      = crd
        manifest = manifest
      }
    ]
  ])

  manifests_by_key = {
    for obj in local.all_manifests :
    obj.key => obj.manifest...
  }

  duplicate_manifests = toset([
    for k, manifests in local.manifests_by_key : k
    if length(manifests) > 1
  ])
}
data "http" "karpenter_crd_raw" {
  for_each = var.create_eks && var.enable_karpenter ? local.karpenter_manifests_map.crd : {}
  url      = each.value
}

data "kubectl_file_documents" "karpenter_crd_doc" {
  for_each = var.create_eks && var.enable_karpenter ? data.http.karpenter_crd_raw : {}
  content  = each.value.response_body
}

resource "kubectl_manifest" "karpenter_crd" {
  for_each  = var.create_eks && var.enable_karpenter ? local.manifests_by_key : {}
  yaml_body = one(each.value)

  lifecycle {
    precondition {
      condition     = length(local.duplicate_manifests) == 0
      error_message = "Duplicate definitions:${join("", formatlist("\n  - %q", local.duplicate_manifests))}"
      #error_message = "Duplicate definitions:${formatlist("\n  - %q", local.duplicate_manifests)}"
    }
  }

  depends_on = [
    module.eks[0],
    module.eks_blueprints_addons[0].time_sleep,
    module.eks_addons[0].time_sleep,
  ]
}

Was hoping something like this would work:

  manifests_by_key = {
    for crd in karpenter_manifests_map :
      crd => for obj in local.all_manifests :
      obj.manifest...
    if crd == obj.key
  }