Raft HA storage nodes not joining the cluster automatically

We are setting up a new cluster via helm install, using raft for ha storage, s3 for storage and azure to auto unseal. Nodes are not joining the existing raft cluster using retry_join.

server:

  logLevel: "trace"

  dataStorage:
    enabled: false                

  standalone:
    enabled: false
  
  volumes:
  - name: tls-internal
    secret:
      secretName:  vault-internal-tls
      optional: true 

  volumeMounts:
  - name: tls-internal
    mountPath: "/vault/config/userconfig"
    readOnly: true

  ha:
    enabled: true
    replicas: 4            

    config: |
      ui = true 
      
      listener "tcp" {
        tls_disable = true
        address = "[::]:8200"
        cluster_address = "[::]:8201"
             
      }

      ha_storage "raft" { 
        path = "/vault/file"
        
       retry_join {
         leader_api_addr = "https://vault-tenant-hashicorp-vault-0.vault-tenant-hashicorp-vault-internal:8200"
       }
       retry_join {
         leader_api_addr = "https://vault-tenant-hashicorp-vault-1.vault-tenant-hashicorp-vault-internal:8200"
       }
       retry_join {
         leader_api_addr = "https://vault-tenant-hashicorp-vault-2.vault-tenant-hashicorp-vault-internal:8200"
   }
        
      }                              

      storage "s3" {
        bucket = ""
        endpoint = "" 
        s3_force_path_style = "true"                 
        disable_ssl = "true"                                           
      }

      seal "azurekeyvault" {
        tenant_id      = ""                        
        vault_name     = ""
        key_name       = ""            
      }

      service_registration "kubernetes" {}

It seems it is necessary to join a new node to the cluster manually the first time? Then retry-join should try to add the node on a restart or pod deletion and re-creation?

The pod logs are repeating

2023-05-24T20:59:17.806Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[“raft_storage_v1”]
2023-05-24T20:59:17.806Z [DEBUG] core.cluster-listener: error handshaking cluster connection: error=“unsupported protocol”

I get the same result whether the leader_api_addr is using https or not.

Vault doesn’t implement retry_join for Raft as ha_storage. IMO it ought to, it’s a missing feature, but if you look in the source code there’s a TODO comment showing no-one got around to writing the code for that.

So, you might as well remove all the retry_join blocks for now. And maybe even open a GitHub issue asking for the missing feature to be fixed.

Additionally… you are storing your Raft data files in /vault/file, a non-persistent Docker volume which is going to get wiped every time your pod restarts.

That’s going to break your cluster.

Turn dataStorage back on, and reconfigure Raft to store data in /vault/data.