Lost kubernetes auth config

Hello,

I am running vault version 1.9.4 with three Pods on OpenShift. Whenever the Pod hosting the leader node gets destroyed, vault agents cannot authenticate against the remaining two leader nodes.

This is usually resolved by writing my kubernetes auth config within one of the remaining two leader nodes as follows:

vault write auth/kubernetes/config \
  token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
  kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
  kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  disable_iss_validation=true

Can someone explain me this strange behavior?

Best,
Sebastian

Sounds like your pointing your auth to the “host” IP address of that Vault node rather than an ingress IP / VIP that would distribute the connection to the various pods.

We configure the raft storage as follows:

          storage "raft" {
            path = "/vault/data"
            
            retry_join {
              leader_api_addr         = "https://vault-0.vault-internal:8200"
              leader_client_cert_file = "/vault/server/config/tls/tls.crt"
              leader_client_key_file  = "/vault/server/config/tls/tls.key"
              leader_ca_cert_file     = "/vault/server/config/tls/ca.crt"
            }

            retry_join {
              leader_api_addr         = "https://vault-1.vault-internal:8200"
              leader_client_cert_file = "/vault/server/config/tls/tls.crt"
              leader_client_key_file  = "/vault/server/config/tls/tls.key"
              leader_ca_cert_file     = "/vault/server/config/tls/ca.crt"
            }

            retry_join {
              leader_api_addr         = "https://vault-2.vault-internal:8200"
              leader_client_cert_file = "/vault/server/config/tls/tls.crt"
              leader_client_key_file  = "/vault/server/config/tls/tls.key"
              leader_ca_cert_file     = "/vault/server/config/tls/ca.crt"
            }
          }

Not sure if this is correct or if we rather should point to the service “https://vault-active:8200” instead.

Best,
Sebastian

I was referring to the load-balancer in front of your vault cluster. That’s what your vault_addr should be on your clients.

Let me take over from Sebastian here.

We checked the vault_addr setting of our vault agents, it points to the load balancer. Could you elaborate how this setting plays into the problem of loosing the kube-auth configuration? If the vault_addr would be wrong, how could this be solved by changing “auth/kubernetes/config”?

I’m assuming the second “leader” is just a typo here, since by definition you can’t have multiple simultaneousl leaders.

I am not familiar with OpenShift. However, if it is capable of tracking the service account token assigned to each pod, and revoking it when the pod is destroyed, this would explain why other pods trying to re-use the destroyed pod’s token would no longer be able to do so.

This configuration is inadvisable for a Vault running within the same cluster it is processing authentication requests for. You should not be specifying kubernetes_ca_cert or token_reviewer_jwt in this case, and allowing them to be picked up automatically from /var/run/secrets/kubernetes.io/serviceaccount/, including future changes to those files.

It doesn’t, @aram just misinterpreted the original problem description.

Thanks, that solved our problem :slight_smile:

1 Like