Vault on k8s with TLS, HA and Raft

I’ve been trying to get Vault running in HA mode with Raft storage in OpenShift 3.11, but I always get the following error when attempting to join the 2nd node to cluster:

* failed to join raft cluster: error during bootstrap init call: Put https://vault3.vault3.svc:8200/v1/sys/storage/raft/bootstrap/challenge: x509: certificate signed by unknown authority

I followed this example to create a cert and key and store them along with the Kubernetes CA as a Kubernetes secret (named vault-server-tls).

The Helm chart values.yaml has a section that references the secret:

  extraVolumes:
  - type: secret
    name: vault-server-tls 

I believe that is working correctly because I can shell into the pods and the vault-server-tls secret is being mounted and the 3 files show up as expected:

/ $ ls -l /vault/userconfig/vault-server-tls
total 0
lrwxrwxrwx    1 root     root            15 Apr  7 19:26 vault.ca -> ..data/vault.ca
lrwxrwxrwx    1 root     root            16 Apr  7 19:26 vault.crt -> ..data/vault.crt
lrwxrwxrwx    1 root     root            16 Apr  7 19:26 vault.key -> ..data/vault.key

The Helm chart values.yaml listener configuration is referencing the cert, key and ca files:

    raft:
      
      # Enables Raft integrated storage
      enabled: false
      config: |
        ui = true
        cluster_addr = "https://POD_IP:8201"

        listener "tcp" {
          tls_disable = 0
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
          tls_key_file  = "/vault/userconfig/vault-server-tls/vault.key"
          tls_client_ca_file = "/vault/userconfig/vault-server-tls/vault.ca"
        }

        storage "raft" {
          path = "/vault/data"
        }

The first pod unseals fine:

oc exec -ti vault3-0 -- vault operator unseal -tls-skip-verify 
Key                    Value
---                    -----
Seal Type              shamir
Initialized            true
Sealed                 false
Total Shares           1
Threshold              1
Version                1.3.4
Cluster Name           vault-cluster-945d1fc5
Cluster ID             2ac97997-b596-4dcc-4926-2d8acec56ed5
HA Enabled             true
HA Cluster             n/a
HA Mode                standby
Active Node Address    <none>

But when I try to join the second pod to the cluster, I get an x509: certificate signed by unknown authority message :

oc exec -ti vault3-1 -- vault operator raft join -tls-skip-verify https://vault3.vault3.svc:8200
Error joining the node to the raft cluster: Error making API request.

URL: POST https://127.0.0.1:8200/v1/sys/storage/raft/join
Code: 500. Errors:

* failed to join raft cluster: error during bootstrap init call: Put https://vault3.vault3.svc:8200/v1/sys/storage/raft/bootstrap/challenge: x509: certificate signed by unknown authority

Shelling into the 2nd pod, I can see that the VAULT_CACERT environment value exists and is set correctly (and as noted above, the 3 files are in place as expected):

/ $ printenv VAULT_CACERT
/vault/userconfig/vault-server-tls/vault.ca
/ $ printenv VAULT_ADDR
https://127.0.0.1:8200
/ $

Here is my CA:

-----BEGIN CERTIFICATE-----
MIIC6jCCAdKgAwIBAgIBATANBgkqhkiG9w0BAQsFADAmMSQwIgYDVQQDDBtvcGVu
c2hpZnQtc2lnbmVyQDE1NjcxNzQyODkwHhcNMTkwODMwMTQxMTI4WhcNMjQwODI4
MTQxMTI5WjAmMSQwIgYDVQQDDBtvcGVuc2hpZnQtc2lnbmVyQDE1NjcxNzQyODkw
ggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDTE3YoGObzoRFLTcLR4Ocv
FPy8XrWvWxBGnb1U9ct2QUaTjxuubcazGZOr8QuovwnFzp3Is8xS3pz9wBDyZqpS
JX1Z3Ja/ngi9472aAPLZnalRnGWkrTl8pAargLTUmnAcS7eEfRs+NH6aZdpefhX5
PQ3Ll1xMuu3U/PVVtfwkLje85LsevlVs5VBjDdc36UOU0BhEJM4pXbvLDG2pLpnM
BhILR2RgOX5oca27K6YDVVTHwxw83EgWvUHwpeoZuyEMPBQqhHJ3fzPvxfx4rCgd
/iV8KO2zP+/3t/kmrkJM7avgWbc7QqWPeKbB7WkKOpaTMwhpe1U1ZNjTnt2tzo8f
AgMBAAGjIzAhMA4GA1UdDwEB/wQEAwICpDAPBgNVHRMBAf8EBTADAQH/MA0GCSqG
SIb3DQEBCwUAA4IBAQC6hJ/74qvFztZ0YQBT5ZhGdwgUQtFYttcgLMRcv6QX8S+M
WK8TnmlMb6xgklzsc63nFzPl1pPHMQ7pEiclUUZCJdivzgVlDGLrjXUxYVLA2TGu
UqJJer9XAkAtrAPZ/ppx3iroinDWA+8mQBs4h9nmmFQS98et7A1Of0Josl/3UE6R
9jQDvadOLi01t3UPqSLvsYfpl5I4cuqpiCSnphTHLIl1AL5PNwuZAgLpOIVVsi9h
7OaIy285vYoGK12L3yUXYj5EixY94CmYMu6LDfiQdeN6P9nE78O4OXPS3uXm2M1/
Trm1rY3ulqyj6jwLwmvPQpwtO81A48GrNTvcoFq+
-----END CERTIFICATE-----

Here is my cert:

-----BEGIN CERTIFICATE-----
MIIDgjCCAmqgAwIBAgIUQ3RqBAahTzugd9PfQ29Y1nAT5EcwDQYJKoZIhvcNAQEL
BQAwJjEkMCIGA1UEAwwbb3BlbnNoaWZ0LXNpZ25lckAxNTc4NDI5MzQwMB4XDTIw
MDQwNzE3NDkwMFoXDTIxMDQwNzE3NDkwMFowHDEaMBgGA1UEAxMRdmF1bHQzLnZh
dWx0My5zdmMwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC5/cyUmBJM
c3ngjqZmc5LDVp7Ft7UNeb2RF3GedRiefp5/S+0xTSGRItigv/qJ4QrGuujkFtcI
O9zwxPFAjXy+HFW3gjUzqqZKdY1aur9QYtVYeZxFHxSy67jLQIT2NIcN2c4HC8yk
FKDTY/sviv5LZy90w7Spw514sdy5bQZApfE9PWrhDMNHYbiU6aq2uWmvlkVTfXdB
v/5DMFE691WcH1x0kJxHW3TMp8vh9Y7N5tm7PTeFPuHc/OimHSvzuMPOqrCPmcoY
h8fvQTBmjOthIY9ATSELXPYnRsOmwqYEnHh2PLIugVRQ/iAT4zSuN0eoo8ja05Oa
hFpxeOfBZWWrAgMBAAGjgbEwga4wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoG
CCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHQYDVR0OBBYEFO2ZvIrxIeG1BFYTMMhQ
T4VJALNjMFoGA1UdEQRTMFGCBnZhdWx0M4INdmF1bHQzLnZhdWx0M4IRdmF1bHQz
LnZhdWx0My5zdmOCH3ZhdWx0My52YXVsdDMuc3ZjLmNsdXN0ZXIubG9jYWyHBH8A
AAEwDQYJKoZIhvcNAQELBQADggEBAFzKuf7vGxowvEqCDLB044DWUFnx5iSJ3t9q
+xJTza3a7aIVZtbHy9wz8z2u/3WZKXKfepQJW8BvFfr1LkZd7bA73/RgysJkLclo
E2gPwRTqIcYdNc0za5GB5WV2HH6TM/Ybfvp+UdJ42YzvhdFr8FL8CZS9IOjJ/yyl
nRp77c4O10OFNwQIxGqsvhcDmKKw9/ISER5opWT9vRKoOlW7vDcsCgOg+td8m0//
WX6IOOQz5gYvBTQFXK4t8cNrA2P8Fb7LICdHD9v8yNshwa0ytYCgRhqoXmSR7fXf
Rw8c2ZjzLNlZbRJprkTafI9lDZBpoltKf0qpfDWfeeEXab2toyw=
-----END CERTIFICATE-----

Here is my key:

-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAuf3MlJgSTHN54I6mZnOSw1aexbe1DXm9kRdxnnUYnn6ef0vt
MU0hkSLYoL/6ieEKxrro5BbXCDvc8MTxQI18vhxVt4I1M6qmSnWNWrq/UGLVWHmc
RR8Usuu4y0CE9jSHDdnOBwvMpBSg02P7L4r+S2cvdMO0qcOdeLHcuW0GQKXxPT1q
4QzDR2G4lOmqtrlpr5ZFU313Qb/+QzBROvdVnB9cdJCcR1t0zKfL4fWOzebZuz03
hT7h3Pzoph0r87jDzqqwj5nKGIfH70EwZozrYSGPQE0hC1z2J0bDpsKmBJx4djyy
LoFUUP4gE+M0rjdHqKPI2tOTmoRacXjnwWVlqwIDAQABAoIBAGfn/1QAhTCKzssC
Rroz1QkmKjthP1fQ7HPTehlBZ8icCmYpf9CsO5V+tzFPL2O6ArA1mYhbrjQeZXdp
PzKGGOuInuaParN7ob01YQwQCnOZU4FDJ2eCTqkKgcmjOOTnQZAeXziWgfQkxhhy
4dCUwS5U7jE4aITzENVt4FcDLpwMZHNZBnfI9npB5/UhaDEl8X6DB/B3lDwB6lWf
mfA20FB3qaLA4tQYRL3DrhD2e5uKje7ajfd6IUHEnvCIfMrrjNrfTQY6I0YfubvB
scs2pr0pyVgZkt4vWZHpwlw2vjMrBy93HtCoAyPN+aSLD9CM4j+HXWJ4Bew5EjFQ
kBEcuCECgYEA6zVVwYWp92iOJ5lhH7tLgkTCBLDCOAdLMuPBJVhQdO1K0FaMwbHV
2Su/RmOPpgYZGAlicdgrm9MS77zsrhWneV3Oo82IpD8j/ah7bP0RGxS7BZDSy+cK
/QI7mYTrNG7Re8+4WdAL9seO4gcS6+bPOBuf5wR7cFeXacJHo72CLvsCgYEAym60
q3hLiC+RZifuK4z9KN3hMQ3MDtbYrjSGROzZF0SWEM1xJke4fX/jjSGt/IMK8p9N
HxcYWD1gVfeP5Vxnj297JdBbi/knmoKoDN075IVP6j1P7Q9UG4XVEK5qgvmYizU8
kd+IbyMa5OXT93M5aw2NgeEoe/JntfEA22cCJRECgYAVQd29PrpMvOtUEt2fQ4sg
e9xZFiyHaclXERRsrp2e469GQvw3qT3dgcGot+jMpXJxJK/8AAB49cuZVSbC2Pwo
0NyTG0lFJtu22hpFkF2SZ/47E4qpmPj6QtBmIIgtVfKi0PQlUdMy+3gjX2ZLYbHK
rVx3QYVycsgha8iTuNXiLwKBgQCwrYS0L27E8rdVUL53djsyIs07kg4qWWuOR7t0
hr9Gpo7PJW9++JPVvPvunpmKzRiN/2lBHFgcE510Cnilt0uPjb4Ol9Z+yTu+iBCC
AckXPx8rks2iWoGO7/Sw9XlyzMNNpG4z5sPeM+ZyJwEkdIWFoLODyu8Zlszbp/eW
hkYB8QKBgQDp9x7rWaWUAeCFTUkEUWNiVXGp/BrTfGGkunZFJXgZrqJHHKSmdeRI
VarplRVJzyeD8qf7qyoS34oaXB/3o+ZdPHxM1+bLp2tQtqwgQtcX36FmyimHggym
71YkHfWj9ABUX0Gczdwj8atJcBrNSiHzbpDvdi1F5vNFymOvfqI9hA==
-----END RSA PRIVATE KEY-----

I’ve followed the example from the Helm TLS example (at the top of this post), probably 10 different times and still get the same error.

I’m not sure what I’m doing wrong, but any help would be greatly appreciated.

I’m working on the exact same thing (though I’m stuck at a different spot). If you have gotten a working cluster going I would love to see updated code that worked for you. In particular I’m trying to run:

  • AKS (3 nodes)
  • vault-helm (ha mode, with raft enabled, same as your config snippet)
  • TLS setup
  • auto-unseal with azure keyvault

I’ve got the auto-unseal working but I’m not sure how exactly my TLS certs should be set up, and my second and third pods aren’t joining the cluster

Hi. I just wanted to chime in as I was playing around with the vault-helm chart and ran into the same problem. Only way to got it working was to build a custom vault docker image on top of vault:1.4.0 where I added the vault.ca certificate from the kubernetes ca and ran update-ca-certificates. With this “custom” image the raft cluster join worked. So I guess maybe the tls_client_ca_file flag and ENV variable are not working as supposed or some alpine mechanisms complain for some other reasons, I don’t know. At least thats my finding so far.

1 Like

I solved the issue with this procedure outside helm too.

I tried to add this parameter to the join:
-leader-ca-cert ``

without success, Im force to update the ca and launch the server after.

I figured that you do not have to provide the path to the certs, but the certs itself.

The documentation here is misleading and I doubt the example for retry_join is working.

This is what worked for me at the end to join the raft cluster in kubernetes

export CA_CERT=`cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt`
vault operator raft join -leader-ca-cert="$CA_CERT" https://vault-0.vault-internal:8200

2 Likes

You are right, that was the issue.

Same result here, changing my raft join command to specify -leader-ca-cert="$(cat /path/to/cert)" makes the additional pods join the raft cluster. I’m also curious how to get the retry_join block to function

Seems they will solve it in version 1.4.2: