Lost kubernetes auth config

Hello,

I am running vault version 1.9.4 with three Pods on OpenShift. Whenever the Pod hosting the leader node gets destroyed, vault agents cannot authenticate against the remaining two leader nodes.

This is usually resolved by writing my kubernetes auth config within one of the remaining two leader nodes as follows:

vault write auth/kubernetes/config \
  token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
  kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
  kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  disable_iss_validation=true

Can someone explain me this strange behavior?

Best,
Sebastian

Sounds like your pointing your auth to the “host” IP address of that Vault node rather than an ingress IP / VIP that would distribute the connection to the various pods.

We configure the raft storage as follows:

          storage "raft" {
            path = "/vault/data"
            
            retry_join {
              leader_api_addr         = "https://vault-0.vault-internal:8200"
              leader_client_cert_file = "/vault/server/config/tls/tls.crt"
              leader_client_key_file  = "/vault/server/config/tls/tls.key"
              leader_ca_cert_file     = "/vault/server/config/tls/ca.crt"
            }

            retry_join {
              leader_api_addr         = "https://vault-1.vault-internal:8200"
              leader_client_cert_file = "/vault/server/config/tls/tls.crt"
              leader_client_key_file  = "/vault/server/config/tls/tls.key"
              leader_ca_cert_file     = "/vault/server/config/tls/ca.crt"
            }

            retry_join {
              leader_api_addr         = "https://vault-2.vault-internal:8200"
              leader_client_cert_file = "/vault/server/config/tls/tls.crt"
              leader_client_key_file  = "/vault/server/config/tls/tls.key"
              leader_ca_cert_file     = "/vault/server/config/tls/ca.crt"
            }
          }

Not sure if this is correct or if we rather should point to the service “https://vault-active:8200” instead.

Best,
Sebastian

I was referring to the load-balancer in front of your vault cluster. That’s what your vault_addr should be on your clients.