Vault K8s HA Raft Certificate Error

rooftop90 · January 18, 2021, 6:35pm

Hello guys,

im trying to setup Vault HA on Kubernetes with Raft. I use an AKS Cluster. I follow this guide:

Vault on Kubernetes Deployment Guide | Vault - HashiCorp Learn

I also use Azure Keyvault. Ever since im trying to configure TLS im running into issues.

In particular the retry_join is not working for me. Log except of one of the nodes that tries to join:

2021-01-18T18:27:58.317Z [INFO] core: attempting to join possible raft leader node: leader_addr=https://vault-0.vault-internal:8200

2021-01-18T18:27:58.539Z [WARN] core: join attempt failed: error=“error during raft bootstrap init call: Put “https://10.244.0.99:8200/v1/sys/storage/raft/bootstrap/challenge”: x509: certificate is valid for MYPUBLICIP, not 10.244.0.99”

section of my raft storage configuration:

listener "tcp" {
      tls_disable = 0
      address = "[::]:8200"
      cluster_address = "[::]:8201"
      tls_ca_cert_file = "/vault/userconfig/vault-server-tls/root-ca.pem"
      tls_cert_file   = "/vault/userconfig/vault-server-tls/vault.crt"
      tls_key_file    = "/vault/userconfig/vault-server-tls/vault.key"
    }

    storage "raft" {
      path = "/vault/data"
      retry_join {
        leader_api_addr = "https://vault-0.vault-internal:8200"
        leader_ca_cert_file = "/vault/userconfig/vault-server-tls/root-ca.pem"
        leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
        leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
      }

      retry_join {
        leader_api_addr = "https://vault-1.vault-internal:8200"
        leader_ca_cert_file = "/vault/userconfig/vault-server-tls/root-ca.pem"
        leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
        leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
      }

      retry_join {
        leader_api_addr = "https://vault-2.vault-internal:8200"
        leader_ca_cert_file = "/vault/userconfig/vault-server-tls/root-ca.pem"
        leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
        leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
      }

      retry_join {
        leader_api_addr = "https://vault-3.vault-internal:8200"
        leader_ca_cert_file = "/vault/userconfig/vault-server-tls/root-ca.pem"
        leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
        leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
      }

      retry_join {
        leader_api_addr = "https://vault-4.vault-internal:8200"
        leader_ca_cert_file = "/vault/userconfig/vault-server-tls/root-ca.pem"
        leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
        leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
      }

Im really wondering why the errors mentions that the certificate doesnt contain the pod ip. The Raft Storage configuration specifies the DNS names like vault-0.vault-internal etc. I certainly cannot add the POD Ips to the certificate as I don’t know them before Vault is deployed.

If anyone has any idea what might be causing this I would appreciate it. I will update this topic if I figure it out myself.

rooftop90 · January 19, 2021, 1:35pm

I kinda made some progress on that matter:
vault_raft

I changed the way I use the Helm chart from using only the values.yaml and added an overwrite yaml just the way to tutorial advised. The error I received may be in connection with the probe settings that I added to my override yaml file:

  # For HA configuration and because we need to manually init the vault,
  # we need to define custom readiness/liveness Probe settings
  readinessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
  livenessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true"
    initialDelaySeconds: 60

Not sure if this really was the issue but it is working for me now. If you encounter similar problems feel free to reply to this thread. Maybe we can pinpoint the error down.

rooftop90 · January 21, 2021, 11:44am

Last update on this matter. I dont think that the readiness/liveness proves had anything to with it. The behavior is somewhat inconsistent. Sometimes the cluster initializes fine and the retry join works as expected. Sometimes I get the error I mentioned with opening this thread. Only workaround for me is re-deploying until it works.

Still cannot determine the root cause in my setup for this behavior.

Topic		Replies	Views
HA Vault cluster using Raft integrated storage with TLS enabled Vault k8s , vault	2	819	April 12, 2023
Unable to join raft leader from vault-1 pod in k8s Vault k8s	0	21	February 6, 2025
Vault on k8s with TLS, HA and Raft Vault	8	5843	May 12, 2020
Vault TLS/ HA raft with gcs bucket issues Vault	0	551	October 6, 2021
Trying to setup Vault HA mode with Raft Vault k8s , vault	4	1497	December 16, 2022

Vault K8s HA Raft Certificate Error

Related topics