Raft HA storage nodes not joining the cluster automatically

tyson.tozier · May 24, 2023, 8:47pm

We are setting up a new cluster via helm install, using raft for ha storage, s3 for storage and azure to auto unseal. Nodes are not joining the existing raft cluster using retry_join.

server:

  logLevel: "trace"

  dataStorage:
    enabled: false                

  standalone:
    enabled: false
  
  volumes:
  - name: tls-internal
    secret:
      secretName:  vault-internal-tls
      optional: true 

  volumeMounts:
  - name: tls-internal
    mountPath: "/vault/config/userconfig"
    readOnly: true

  ha:
    enabled: true
    replicas: 4            

    config: |
      ui = true 
      
      listener "tcp" {
        tls_disable = true
        address = "[::]:8200"
        cluster_address = "[::]:8201"
             
      }

      ha_storage "raft" { 
        path = "/vault/file"
        
       retry_join {
         leader_api_addr = "https://vault-tenant-hashicorp-vault-0.vault-tenant-hashicorp-vault-internal:8200"
       }
       retry_join {
         leader_api_addr = "https://vault-tenant-hashicorp-vault-1.vault-tenant-hashicorp-vault-internal:8200"
       }
       retry_join {
         leader_api_addr = "https://vault-tenant-hashicorp-vault-2.vault-tenant-hashicorp-vault-internal:8200"
   }
        
      }                              

      storage "s3" {
        bucket = ""
        endpoint = "" 
        s3_force_path_style = "true"                 
        disable_ssl = "true"                                           
      }

      seal "azurekeyvault" {
        tenant_id      = ""                        
        vault_name     = ""
        key_name       = ""            
      }

      service_registration "kubernetes" {}

It seems it is necessary to join a new node to the cluster manually the first time? Then retry-join should try to add the node on a restart or pod deletion and re-creation?

The pod logs are repeating

2023-05-24T20:59:17.806Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[“raft_storage_v1”]
2023-05-24T20:59:17.806Z [DEBUG] core.cluster-listener: error handshaking cluster connection: error=“unsupported protocol”

I get the same result whether the leader_api_addr is using https or not.

maxb · May 24, 2023, 10:42pm

Vault doesn’t implement retry_join for Raft as ha_storage. IMO it ought to, it’s a missing feature, but if you look in the source code there’s a TODO comment showing no-one got around to writing the code for that.

So, you might as well remove all the retry_join blocks for now. And maybe even open a GitHub issue asking for the missing feature to be fixed.

Additionally… you are storing your Raft data files in /vault/file, a non-persistent Docker volume which is going to get wiped every time your pod restarts.

That’s going to break your cluster.

Turn dataStorage back on, and reconfigure Raft to store data in /vault/data.

Topic		Replies	Views
Auto Join in the raft storage - Deployed using helm chart Vault vault	1	1217	August 19, 2022
Vault ha cluster - node vault-1 didn't join the cluster after init the node vault-0 Vault k8s , connect	10	866	July 7, 2023
Vault K8s HA Raft Certificate Error Vault k8s , azure	2	789	January 21, 2021
Vault TLS/ HA raft with gcs bucket issues Vault	0	549	October 6, 2021
Seamless cluster auto join Vault k8s	2	661	September 7, 2022

Raft HA storage nodes not joining the cluster automatically

Related topics