Unable to configure vault raft storage HA cluster with TLS, raft challange problem

Hello,

I am setting up a Vault 3-node HA cluster using Raft storage. However, I am encountering the following errors:

1. error during raft bootstrap init call: Error making API request.
2. Code: 503. Errors:
   URL: PUT https://hc-vault-2.local:8200/v1/sys/storage/raft/bootstrap/challenge
3. [ERROR] core: failed to get raft challenge: leader_addr=https://hc-vault-2.deltaops.labs:8200
4. [ERROR] core: failed to get raft challenge: leader_addr=https://hc-vault-3.deltaops.labs:8200
4. [ERROR] core: failed to retry join raft cluster: retry=2s err="failed to get raft challenge"

Here’s what I’ve done so far:

  1. I created a self-owned root CA and distributed the root_ca.crt file to all servers (running Debian 12 Bookworm).
  2. I updated the CA certificates on each server using the update-ca-certificates command.
  3. I generated a unique TLS certificate (hc-vault-*.local.crt) and private key (hc-vault-*.local.key) for each server in the cluster. Each.crt file includes the root CA certificate.

Certificate verification tests (same result on each node):

openssl verify -CAfile /usr/local/share/ca-certificates/root_ca.crt /usr/local/share/ca-certificates/hc-vault-1.local.crt

/usr/local/share/ca-certificates/hc-vault-1.local.crt: OK
curl -v --cacert /usr/local/share/ca-certificates/root_ca.crt https://hc-vault-2.local:8200/v1/sys/health

*   Trying 10.3.2.193:8200...
* Connected to hc-vault-2.local (10.3.2.193) port 8200 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /usr/local/share/ca-certificates/root_ca.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_CHACHA20_POLY1305_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: O=Self; CN=hc-vault-2.local
*  start date: Jan 21 07:43:29 2025 GMT
*  expire date: Jan 22 07:43:29 2026 GMT
*  subjectAltName: host "hc-vault-2.local" matched cert's "hc-vault-2.local"
*  issuer: O=Self; CN=Root CA
*  SSL certificate verify ok.
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /v1/sys/health]
* h2h3 [:scheme: https]
* h2h3 [:authority: hc-vault-2.local:8200]
* h2h3 [user-agent: curl/7.88.1]
* h2h3 [accept: */*]
* Using Stream ID: 1 (easy handle 0x55fdecf60ce0)
> GET /v1/sys/health HTTP/2
> Host: hc-vault-2.local:8200
> user-agent: curl/7.88.1
> accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 501 
< cache-control: no-store
< content-type: application/json
< strict-transport-security: max-age=31536000; includeSubDomains
< content-length: 296
< date: Wed, 22 Jan 2025 07:58:22 GMT
< 
{"initialized":false,"sealed":true,"standby":true,"performance_standby":false,"replication_performance_mode":"unknown","replication_dr_mode":"unknown","server_time_utc":1737532702,"version":"1.18.3","enterprise":false,"echo_duration_ms":0,"clock_skew_ms":0,"replication_primary_canary_age_ms":0}
* Connection #0 to host hc-vault-2.local left intact

DNS resolution works properly

Despite this setup, I am unsure about the TLS configuration in the retry_join stanza. Specifically, I need clarification on whether certificates for every node need to be present on the potential leader node.

I also don’t understand tls configuration in retry_join stanza, should certificates for each node be present on the possible leader node?

For example, should Node 1 have the certificate files for Node 2 and Node 3? And should the same apply to every other node in the cluster?

I just don’t understand what certificates should be configured in these parameters:

  1. leader_client_cert_file
  2. leader_client_key_file
  3. leader_ca_cert_file

Any kind of help is appreciated.

Configurations for each node in /etc/vault.d/vault.hcl:

Node 1:

cluster_addr  = "https://hc-vault-1.local:8201"
api_addr      = "https://hc-vault-1.local:8200"
disable_mlock = true
ui            = true

listener "tcp" {
    address             = "0.0.0.0:8200"
    tls_disable         = "0"
    tls_cert_file       = "/usr/local/share/ca-certificates/hc-vault-1.local.crt"
    tls_key_file        = "/usr/local/share/ca-certificates/hc-vault-1.local.key"
    tls_client_ca_file  = "/usr/local/share/ca-certificates/root_ca.crt"
}

storage "raft" {
    path    = "/opt/vault/data"
    node_id = "48917b2c-e557-5f23-bc19-ef35d167899c"

    retry_join {
        leader_api_addr         = "https://hc-vault-3.local:8200"
        leader_client_cert_file = "/usr/local/share/ca-certificates/hc-vault-1.local.crt"
        leader_client_key_file  = "/usr/local/share/ca-certificates/hc-vault-1.local.key"
        leader_ca_cert_file     = "/usr/local/share/ca-certificates/root_ca.crt"
    }

    retry_join {
        leader_api_addr         = "https://hc-vault-2.local:8200"
        leader_client_cert_file = "/usr/local/share/ca-certificates/hc-vault-1.local.crt"
        leader_client_key_file  = "/usr/local/share/ca-certificates/hc-vault-1.local.key"
        leader_ca_cert_file     = "/usr/local/share/ca-certificates/root_ca.crt"
    }
}

Node 2:

cluster_addr  = "https://hc-vault-2.local:8201"
api_addr      = "https://hc-vault-2.local:8200"
disable_mlock = true
ui            = true

listener "tcp" {
    address             = "0.0.0.0:8200"
    tls_disable         = "0"
    tls_cert_file       = "/usr/local/share/ca-certificates/hc-vault-2.local.crt"
    tls_key_file        = "/usr/local/share/ca-certificates/hc-vault-2.local.key"
    tls_client_ca_file  = "/usr/local/share/ca-certificates/root_ca.crt"
}

storage "raft" {
    path    = "/opt/vault/data"
    node_id = "63be374c-68d2-566d-94fd-45a67c6d3f25"

    retry_join {
        leader_api_addr         = "https://hc-vault-3.local:8200"
        leader_client_cert_file = "/usr/local/share/ca-certificates/hc-vault-2.local.crt"
        leader_client_key_file  = "/usr/local/share/ca-certificates/hc-vault-2.local.key"
        leader_ca_cert_file     = "/usr/local/share/ca-certificates/root_ca.crt"
    }

    retry_join {
        leader_api_addr         = "https://hc-vault-1.local:8200"
        leader_client_cert_file = "/usr/local/share/ca-certificates/hc-vault-2.local.crt"
        leader_client_key_file  = "/usr/local/share/ca-certificates/hc-vault-2.local.key"
        leader_ca_cert_file     = "/usr/local/share/ca-certificates/root_ca.crt"
    }
}

Node 3:

cluster_addr  = "https://hc-vault-3.local:8201"
api_addr      = "https://hc-vault-3.local:8200"
disable_mlock = true
ui            = true

listener "tcp" {
    address             = "0.0.0.0:8200"
    tls_disable         = "0"
    tls_cert_file       = "/usr/local/share/ca-certificates/hc-vault-3.local.crt"
    tls_key_file        = "/usr/local/share/ca-certificates/hc-vault-3.local.key"
    tls_client_ca_file  = "/usr/local/share/ca-certificates/root_ca.crt"
}

storage "raft" {
    path    = "/opt/vault/data"
    node_id = "847944f0-a10c-574d-812c-c5edcbe64527"

    retry_join {
        leader_api_addr         = "https://hc-vault-2.local:8200"
        leader_client_cert_file = "/usr/local/share/ca-certificates/hc-vault-3.local.crt"
        leader_client_key_file  = "/usr/local/share/ca-certificates/hc-vault-3.local.key"
        leader_ca_cert_file     = "/usr/local/share/ca-certificates/root_ca.crt"
    }

    retry_join {
        leader_api_addr         = "https://hc-vault-1.local:8200"
        leader_client_cert_file = "/usr/local/share/ca-certificates/hc-vault-3.local.crt"
        leader_client_key_file  = "/usr/local/share/ca-certificates/hc-vault-3.local.key"
        leader_ca_cert_file     = "/usr/local/share/ca-certificates/root_ca.crt"
    }
}

Overly complicated certificate use, to be honest. I’d double check how the .cer files were generated and what format they’re in.