GKE Cluster sample dead end - auth problem after adding app install steps

Hello,

I am following Provision a GKE Cluster (Google Cloud) | Terraform - HashiCorp Learn and trying to build upon it, but having trouble.

I am able to apply code based on this set of instructions and successfully get an empty Kubernetes cluster as I expect. However, the instructions switch away to running a bunch of manual kubectl commands, which frankly defeats the purpose of config as code.

Once I attempt to add a terraform block that does something with the cluster, such as installing Jenkins from Helm, I run into auth problems.

The tutorial code describes

...
# provider "kubernetes" {
#   load_config_file = "false"

#   host     = google_container_cluster.primary.endpoint
#   username = var.gke_username
#   password = var.gke_password

But that is demonstrating a deprecated authentication approach, and is a dead end. Instead I came up with this:

data "google_client_config" "default" {}

provider "kubernetes" {
  host = google_container_cluster.primary.endpoint

  token                  = data.google_client_config.default.access_token
  cluster_ca_certificate = base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)
}

However terraform apply results in this error:

Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

Then on this next page of the learn site: Manage Kubernetes Resources via Terraform | Terraform - HashiCorp Learn

It says “cloud-specific auth plugin” is the top recommendation, and “The cloud provider [instructions] will configure the Kubernetes provider using cloud-specific auth tokens” referring back to the page ^ recommending deprecated basic auth.

How exactly should I declare the Kubernetes provider to use cloud-specific auth plugin?

Update: it works if I manually reconfigure kubectl after the error, pointing it to the new cluster. (e.g., I run gcloud container clusters get-credentials <new cluster name> --region <region>)

I also did an export KUBE_CONFIG_PATH=/home/me/.kube/config though that might not have been needed.

My question has shifted slightly: how to write a terraform script that configures kubectl with the cluster it just created, so it can start installing into the cluster.

Update: after much frustration I solved it.

For possible benefit of others - here’s how I understand the way to auth without a manual step and without writing credentials to a file.

The sample makes use of Google’s google_container_cluster. This module creates the new empty GKE cluster and will produce terraform output variables with the certs and keys needed to install things into the cluster with something like Helm. Those variables (which they call “attributes” but should call “outputs”) are described here: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#attributes-reference

(Side note. In the Hashi docs I found a confusing note which I understood to tell me to separate creation of the GKE cluster from the scripts that install into the cluster, because of timing concerns. I did this when things weren’t working and it made no difference; it might not be necessary but I left it this way.)

My ‘create cluster’ module uses the basics of the Hashi sample, and I renamed primary to my_cluster which is less confusing to me, but most importantly exposed the creds/keys as outputs:

output "config" {
  value = {
    name = google_container_cluster.my_cluster.name
    region = var.region
    kubernetes_cluster = google_container_cluster.my_cluster
    ca_certificate = base64decode(google_container_cluster.my_cluster.master_auth.0.cluster_ca_certificate)
    client_certificate = base64decode(google_container_cluster.my_cluster.master_auth.0.client_certificate)
    client_key = base64decode(google_container_cluster.my_cluster.master_auth.0.client_key)
    host = google_container_cluster.my_cluster.endpoint
  }
}

And my ‘install apps’ module uses those outputs:

variable "cluster_config" {}

provider "kubernetes" {
  host                   = var.cluster_config.host
  cluster_ca_certificate = var.cluster_config.ca_certificate
  client_certificate     = var.cluster_config.client_certificate
  client_key             = var.cluster_config.client_key
}

provider "helm" {
  kubernetes {
    host                   = var.cluster_config.host
    cluster_ca_certificate = var.cluster_config.ca_certificate
    client_certificate     = var.cluster_config.client_certificate
    client_key             = var.cluster_config.client_key
  }
}

Then my parent module that ties create cluster with install apps is

module "my_cluster" {
  source = "./create-cluster"
  project_id = var.project_id
  region = var.region
}

module "my_cluster_apps" {
  source = "./install-apps"
  cluster_config = module.my_cluster.config
}

Update: yesterday the above worked fine. Today I got several errors on terraform apply

Error: <some operation> forbidden: User "system:anonymous" cannot get resource <blah>

Maybe someone could explain why this happens. I speculate when the cluster was created yesterday, somehow a default token was set. And today, it seems that token is invalid and results in confusing errors.

The workaround was to add this:

data "google_client_config" "provider" {}

provider "kubernetes" {
  ...
  token = data.google_client_config.provider.access_token
}

provider "helm" {
  kubernetes {
    ...
    token = data.google_client_config.provider.access_token
  }
}