Vault HA cluster nodes all in standby when using raft for HA storage

tyson.tozier · May 9, 2023, 7:39pm

HI,
When using the vault helm chart and enabling HA mode using raft for HA storage and S3 for storage. All HA nodes come up in standby and Active Node Address is not set.

Below are relevant values I’ve set…

      dataStorage:
        enabled: false
        mountPath: "/vault/file"
    
      extraEnvironmentVars:
        VAULT_ENABLE_FILE_PERMISSIONS_CHECK: "true"

      extraSecretEnvironmentVars: 
        - envName: AWS_ACCESS_KEY_ID
          secretName: s3bucketauth
          secretKey: aws_access_key_id
        - envName: AWS_SECRET_ACCESS_KEY
          secretName: s3bucketauth
          secretKey: aws_secret_access_key   
        
      standalone:
        enabled: false

      ha:
        enabled: true
        replicas: 2
        setNodeId: false
        
        config: |
          ui = true
                      
          
          listener "tcp" {
            tls_disable = 1
            address = "[::]:8200"
            cluster_address = "[::]:8201"          
          }

          ha_storage "raft" { 
            path = "/vault/file" 
              
          }             
          

          storage "s3" {
            bucket = "<bucket-name>"
            endpoint = "<hostname>" 
            s3_force_path_style = "true"                                    
            disable_ssl = "true"
          }
          service_registration "kubernetes" {}

logs from one of the nodes…

==> Vault server configuration:

              HA Storage: raft
             Api Address: http://<ip-address>:8200
                     Cgo: disabled
         Cluster Address: https://vault-tenant-hashicorp-vault-0.vault-tenant-hashicorp-vault-internal:8201
              Go Version: go1.19.2
              Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
               Log Level: info
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: s3
                 Version: Vault v1.12.1, built 2022-10-27T12:32:05Z
             Version Sha: e34f8a14fb7a88af4640b09f3ddbb5646b946d9c

==> Vault server started! Log data will stream in below:

2023-05-08T19:55:16.044Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
2023-05-08T19:55:16.090Z [INFO]  core: Initializing version history cache for core
2023-05-08T20:05:58.834Z [INFO]  core.cluster-listener.tcp: starting listener: listener_address=[::]:8201
2023-05-08T20:05:58.834Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
2023-05-08T20:05:58.839Z [INFO]  core: vault is unsealed
2023-05-08T20:05:58.840Z [INFO]  core: entering standby mode

maxb · May 10, 2023, 1:34pm

This doesn’t make sense… if you were turning this off, it would be pointless to specify a mountPath, but worse, you’re turning this off, whilst also using this path as the location where Raft keeps its data files.

This means that as soon as any of your pods are recreated, the Raft data files get erased.

At a guess, this loss of Raft data files is probably why your cluster is failing to elect an active node.

It’s not sensible to run with 2 replicas when using Raft. Raft works by consensus amongst a quorum of nodes, meaning that strictly more than half of the nodes need to be up at any time. In a cluster of 2, strictly more than half is 2 - meaning the entire cluster must be up for it to work. You need 3 nodes for meaningful HA using a consensus/quorum system.

Delete this, because it isn’t even a valid value that the Helm chart uses, nested at this point in the structure. (It would have to be under raft to be used, but it’s just re-iterating the default anyway.)

After setting up a new cluster using Raft ha_storage, it is necessary to bring all the nodes online and have them form a cluster.

This requires a sequence of CLI or API commands.

Step 1: Use vault operator init on one node

Step 2: Use vault operator unseal on all nodes

Step 3: Use vault operator raft join on each of the other nodes. During this step, the other nodes will fetch the api_addr of the initialized and active node from the shared storage (S3 here), connect to it, and use their shared knowledge of the unseal key (provided to them in step 2) to authenticate and form the Raft cluster.

Once the procedure is complete you can use:

vault operator raft list-peers to display the Raft cluster membership
vault operator members to display the cluster members at the Vault HA layer

tyson.tozier · May 10, 2023, 2:21pm

Thanks maxb,

You were correct that having the dataStorage path set seemed to be causing my problems despite having dataStorage enabled set to false. Pretty basic mistake on my part but a good lesson to keep things clean. Thanks.

maxb · June 14, 2023, 6:34am

@andrewroffey Please start your own topic to ask your own questions, and provide more details about your reasons for asking.

Neither of these questions appear relevant to the original poster’s context.

Topic		Replies	Views
[Solved] Raft HA: All nodes in standby modes Vault	3	1508	October 13, 2020
Vault replication between K8S clusters Vault	1	98	August 6, 2024
Vault pod stuck with no `Active Node Address` causing `local node not active but active cluster node not found` Vault	3	6020	June 23, 2022
Issue on a leader in ha cluster Vault Vault	1	728	August 2, 2023
HA cluster never becomes active Vault	0	11	April 15, 2025

Vault HA cluster nodes all in standby when using raft for HA storage

Related topics