Vault HA cluster nodes all in standby when using raft for HA storage

When using the vault helm chart and enabling HA mode using raft for HA storage and S3 for storage. All HA nodes come up in standby and Active Node Address is not set.

Below are relevant values I’ve set…

        enabled: false
        mountPath: "/vault/file"

        - envName: AWS_ACCESS_KEY_ID
          secretName: s3bucketauth
          secretKey: aws_access_key_id
        - envName: AWS_SECRET_ACCESS_KEY
          secretName: s3bucketauth
          secretKey: aws_secret_access_key   
        enabled: false

        enabled: true
        replicas: 2
        setNodeId: false
        config: |
          ui = true
          listener "tcp" {
            tls_disable = 1
            address = "[::]:8200"
            cluster_address = "[::]:8201"          

          ha_storage "raft" { 
            path = "/vault/file" 

          storage "s3" {
            bucket = "<bucket-name>"
            endpoint = "<hostname>" 
            s3_force_path_style = "true"                                    
            disable_ssl = "true"
          service_registration "kubernetes" {}

logs from one of the nodes…

==> Vault server configuration:

              HA Storage: raft
             Api Address: http://<ip-address>:8200
                     Cgo: disabled
         Cluster Address: https://vault-tenant-hashicorp-vault-0.vault-tenant-hashicorp-vault-internal:8201
              Go Version: go1.19.2
              Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
               Log Level: info
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: s3
                 Version: Vault v1.12.1, built 2022-10-27T12:32:05Z
             Version Sha: e34f8a14fb7a88af4640b09f3ddbb5646b946d9c

==> Vault server started! Log data will stream in below:

2023-05-08T19:55:16.044Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
2023-05-08T19:55:16.090Z [INFO]  core: Initializing version history cache for core
2023-05-08T20:05:58.834Z [INFO]  core.cluster-listener.tcp: starting listener: listener_address=[::]:8201
2023-05-08T20:05:58.834Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
2023-05-08T20:05:58.839Z [INFO]  core: vault is unsealed
2023-05-08T20:05:58.840Z [INFO]  core: entering standby mode

This doesn’t make sense… if you were turning this off, it would be pointless to specify a mountPath, but worse, you’re turning this off, whilst also using this path as the location where Raft keeps its data files.

This means that as soon as any of your pods are recreated, the Raft data files get erased.

At a guess, this loss of Raft data files is probably why your cluster is failing to elect an active node.

It’s not sensible to run with 2 replicas when using Raft. Raft works by consensus amongst a quorum of nodes, meaning that strictly more than half of the nodes need to be up at any time. In a cluster of 2, strictly more than half is 2 - meaning the entire cluster must be up for it to work. You need 3 nodes for meaningful HA using a consensus/quorum system.

Delete this, because it isn’t even a valid value that the Helm chart uses, nested at this point in the structure. (It would have to be under raft to be used, but it’s just re-iterating the default anyway.)

After setting up a new cluster using Raft ha_storage, it is necessary to bring all the nodes online and have them form a cluster.

This requires a sequence of CLI or API commands.

Step 1: Use vault operator init on one node

Step 2: Use vault operator unseal on all nodes

Step 3: Use vault operator raft join on each of the other nodes. During this step, the other nodes will fetch the api_addr of the initialized and active node from the shared storage (S3 here), connect to it, and use their shared knowledge of the unseal key (provided to them in step 2) to authenticate and form the Raft cluster.

Once the procedure is complete you can use:

  • vault operator raft list-peers to display the Raft cluster membership
  • vault operator members to display the cluster members at the Vault HA layer

Thanks maxb,

You were correct that having the dataStorage path set seemed to be causing my problems despite having dataStorage enabled set to false. Pretty basic mistake on my part but a good lesson to keep things clean. Thanks.