Hi,
Over the past few months, I have been working on an enterprise project to migrate an interactive node app into AWS. The specifics of the app aren’t important, however I built the infrastructure for the whole stack through Terraform (0.12). The backend architecture is essentially an EKS cluster running Traefik 2.3 as the ingress controller, fronted by an NLB.
Through development of the project, I came across some interesting behaviour with Terraform that I had to work around by structuring the project different, into “staged” terraform runs. Here is a summary of what I noticed:
If you have a TF project that looks to create a new EKS cluster using the AWS provider & then also deploy k8s resources onto the newly created cluster, you cannot achieve this in a single “terraform apply”. The reason for this is that the Kubernetes TF provider is initialised with auth data during the plan phase. So essentially, if you have all of your TF resources defined in a single .tf configuration, the EKS cluster will be created with no issues, but when it comes to the kubernetes resources, creation of these objects will fail with auth issues. If you were to then re-run a “terraform apply” however, the kubernetes provider would then be able to successfully pick up the already existing EKS auth data and the creation of the k8s resources would then succeed.
To get around this issue, I essentially has to break up the project into “layers” or “phases”, if you will. I have a directory called “layer-1” which creates the AWS provider provisioned resources, like the EKS cluster, auto-scaling template and ASG. I have configured this layer to then output a localfile for the kubeconfig for the EKS cluster, which is then consumed by “layer-2”. “layer-2” then contains all of the configuration for the kubernetes provider provisioned resources (Flux GitOps controller in this instance). This structure results in isolated state files for each “layer” (which has actually proved to be helpful in some cases, although I am pretty sure this is not the way TF was intended to be used!).
This has been noted elsewhere (https://blog.logrocket.com/dirty-terraform-hacks/ - Break up dependent providers into staged Terraform runs).
I have been trying to read up on whether this scenario has been reported elsewhere, however I haven’t had much luck. I was also wondering if this may have been resolved in TF 0.13, but the changelogs don’t appear to reference this use case.
Just wondering if anyone else has encountered this?
Edit: Commented on https://github.com/hashicorp/terraform/issues/2430 which appears to detail the same topic