HA vault on kubernetes using raft storage - no unseal keys for nodes

I’m deploying a new 3 node vault cluster on kubernetes using helm charts. I’m using the hashicorp vault-helm charts at hashicorp/vault-helm: Helm chart to install Vault and other associated components.

The pertinent parts:

    raft:
      config: |
        ui = true

        listener "tcp" {
          tls_disable = 0
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
          tls_key_file  = "/vault/userconfig/vault-ha-tls/vault.key"
          tls_client_ca_file = "/vault/userconfig/vault-ha-tls/vault.ca"

          # Enable unauthenticated metrics access (necessary for Prometheus Operator)
          #telemetry {
          #  unauthenticated_metrics_access = "true"
          #}
        }

        storage "raft" {
          path = "/vault/data"

          retry_join {
            leader_api_addr = "https://vault-0.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
          }
          retry_join {
            leader_api_addr = "https://vault-1.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
          }
          retry_join {
            leader_api_addr = "https://vault-2.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
          }
        }

        disable_mlock = true
        service_registration "kubernetes" {}
      enabled: true
      setNodeId: true
    replicas: 3

My understanding from documentation is that the nodes should automatically join the leader node that is initialized and unsealed. The raft comes to consensus and that I should be able to use the root token/unseal keys from leader on the other nodes. In my cluster, I initialized and unsealed vault-2. I can see from list-peers that vault-0 has joined but is not a voter(suggests a problem here).

kubectl exec -it vault-2 -- /bin/sh

/ $ vault status
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            5
Threshold               3
Version                 1.19.0
Build Date              2025-03-04T12:36:40Z
Storage Type            raft
Cluster Name            vault-cluster-d06fa999
Cluster ID              4d7ee770-5ad5-806d-4bf8-d593603e6748
Removed From Cluster    false
HA Enabled              true
HA Cluster              https://vault-2.vault-internal:8201
HA Mode                 active
Active Since            2025-05-23T15:43:48.072301656Z
Raft Committed Index    50
Raft Applied Index      50

/ $ vault operator raft list-peers
Node       Address                        State       Voter
----       -------                        -----       -----
vault-2    vault-2.vault-internal:8201    leader      true
vault-0    vault-0.vault-internal:8201    follower    false
/ $

I’ve issued no other commands in the vault cluster. I’ve only interacted with vault-2 at this point. However, I can see on the vault-0 node that it has been initialized. I did not do this init. I have no unseal keys for vault-0 node.

 kubectl exec -it vault-0 -- /bin/sh

/ $ vault status
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  true
Total Shares            5
Threshold               3
Unseal Progress         0/3
Unseal Nonce            n/a
Version                 1.19.0
Build Date              2025-03-04T12:36:40Z
Storage Type            raft
Removed From Cluster    false
HA Enabled              true

As reported by vault status, the node is sealed. I am not able to unseal it using the keys from vault-2. I don’t believe this is expected behavior.

What am I missing?

I found the solution to my issue in a few steps. First, I believe there was a small lag issue between deploying the servers and running the vault operator unseal on the 2 other nodes. I waited about 60 secs after deployment just to allow everything to settle down a bit.

Second, I made the following configuration changes for address and cluster_address to use the IPv4 notation. Additionally, I commented out the leader_client_cert_file and leader_client_key_file and only left the path to the leader_ca_cert_file.

        ui = true

        listener "tcp" {
          tls_disable = 0
          
          #Changed the IP notation to IPv4 instead of IPv6. I'm not running IPv6 in my cluster.
          address = "0.0.0.0:8200"
          cluster_address = "0.0.0.0:8201"

          tls_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
          tls_key_file  = "/vault/userconfig/vault-ha-tls/vault.key"
          tls_client_ca_file = "/vault/userconfig/vault-ha-tls/vault.ca"

          # Enable unauthenticated metrics access (necessary for Prometheus Operator)
          #telemetry {
          #  unauthenticated_metrics_access = "true"
          #}
        }


        storage "raft" {
          path = "/vault/data"

          retry_join {
            leader_api_addr = "https://vault-0.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"

          # Commented out these options for each node.
          #   leader_client_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
          #   leader_client_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
          }
          retry_join {
            leader_api_addr = "https://vault-1.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"

          #   leader_client_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
          #   leader_client_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
          }
          retry_join {
            leader_api_addr = "https://vault-2.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"

          #   leader_client_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
          #   leader_client_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
          }
       }
   

Lastly, I was able to unseal the other 2 vault nodes using the unseal keys from the first node I initialized in the cluster. Raft then began replicating as expected.

kubectl tail -n vault

vault/vault-0[vault]: 2025-05-24T23:22:38.531Z [TRACE] core: forwarding RPC: echo received: node_id=vault-2 applied_index=42 term=3 desired_suffrage=voter sdk_version=1.19.0 upgrade_version=1.19.0 redundancy_zone=""
vault/vault-0[vault]: 2025-05-24T23:22:39.226Z [TRACE] core: forwarding RPC: echo received: node_id=vault-1 applied_index=42 term=3 desired_suffrage=voter sdk_version=1.19.0 upgrade_version=1.19.0 redundancy_zone=""

vault/vault-0[vault]: 2025-05-24T23:22:42.410Z [INFO]  storage.raft.autopilot: Promoting server: id=vault-2 address=vault-2.vault-internal:8201 name=vault-2
vault/vault-0[vault]: 2025-05-24T23:22:42.410Z [INFO]  storage.raft: updating configuration: command=AddVoter server-id=vault-2 server-addr=vault-2.vault-internal:8201 servers="[{Suffrage:Voter ID:vault-0 Address:vault-0.vault-internal:8201} {Suffrage:Voter ID:vault-1 Address:vault-1.vault-internal:8201} {Suffrage:Voter ID:vault-2 Address:vault-2.vault-internal:8201}]"
vault/vault-0[vault]: 2025-05-24T23:22:43.531Z [TRACE] core: forwarding RPC: echo received: node_id=vault-2 applied_index=43 term=3 desired_suffrage=voter sdk_version=1.19.0 upgrade_version=1.19.0 redundancy_zone=""
vault/vault-0[vault]: 2025-05-24T23:22:44.226Z [TRACE] core: forwarding RPC: echo received: node_id=vault-1 applied_index=43 term=3 desired_suffrage=voter sdk_version=1.19.0 upgrade_version=1.19.0 redundancy_zone=""
vault/vault-2[vault]: 2025-05-24T23:22:46.190Z [TRACE] storage.raft: triggering raft config reload due to initial timeout
vault/vault-2[vault]: 2025-05-24T23:22:46.190Z [TRACE] storage.raft: reloaded raft config to set lower timeouts: config="raft.ReloadableConfig{TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000}"
vault/vault-0[vault]: 2025-05-24T23:22:48.531Z [TRACE] core: forwarding RPC: echo received: node_id=vault-2 applied_index=43 term=3 desired_suffrage=voter sdk_version=1.19.0 upgrade_version=1.19.0 redundancy_zone=""
/ $ vault operator raft list-peers
Node       Address                        State       Voter
----       -------                        -----       -----
vault-0    vault-0.vault-internal:8201    leader      true
vault-1    vault-1.vault-internal:8201    follower    true
vault-2    vault-2.vault-internal:8201    follower    true

The documentation on the raft join command provided a key note.: operator raft - Command | Vault | HashiCorp Developer

If raft is used for storage, the node must be joined before unsealing
and the leader-api-addr argument must be provided.

If raft is used for ha_storage, the node must be first unsealed before
joining and the leader-api-addr must not be provided.

This documentation is somewhat confusing. For my scenario, I’m using raft as the storage and it supports the HA coordination. Vault configuration parameters | Vault | HashiCorp Developer

I hope this helps someone in the future. Raft is a bit of a beast to configure correctly.