Cannot join new members to the leader [HA and TLS]

I created a TLS [with HA enabled] as below [TLS is created via cert-manager]:

  enabled: true
  tlsDisable: false

  enabled: true
    repository: "hashicorp/vault-k8s"
    tag: "1.1.0"
    pullPolicy: IfNotPresent
    repository: "hashicorp/vault"
    tag: "1.12.1"
    pullPolicy: IfNotPresent

    VAULT_CACERT: /vault/userconfig/vault-server-tls/ca.crt
    VAULT_TLSCERT: /vault/userconfig/vault-server-tls/tls.crt
    VAULT_TLSKEY: /vault/userconfig/vault-server-tls/tls.key

    - name: vault-tls
        defaultMode: 420
        secretName: vault-cert

    - mountPath: /vault/userconfig/vault-server-tls/
      name: vault-tls
      readOnly: true

  affinity: ""
    enabled: true
      enabled: true
      setNodeId: true
      config: |
        listener "tcp" {
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-server-tls/tls.crt"
          tls_key_file  = "/vault/userconfig/vault-server-tls/tls.key"
          tls_client_ca_file = "/vault/userconfig/vault-server-tls/ca.crt"
        {{- $replicas := 3 | int -}}
        {{ range $k, $v := until $replicas }}
        retry_join {
          leader_api_addr = "https://vault-{{ $k }}.vault-testing.svc:8200"
          leader_ca_cert  = "/vault/userconfig/vault-server-tls/ca.crt"
          leader_client_cert = "/vault/userconfig/vault-server-tls/tls.crt"
          leader_client_key = "/vault/userconfig/vault-server-tls/tls.key"
        {{ end }}

        storage "raft" {
          path = "/vault/data"

        disable_mlock = true
        service_registration "kubernetes" {}
    storageClass: "local"
    repository: "hashicorp/vault"
    tag: "1.12.1"
    pullPolicy: IfNotPresent

  enabled: true

I am able to initialize vault in vault-0 pod.
Also, when I’m going into vault-1 to join it

vault operator raft join -address=https://vault-0.vault-internal:8200

I got no errors:

Key       Value
---       -----
Joined    true

but when I’m going to vault-0 [leader] to see the members I’ll see only one:

Host Name    API Address                 Cluster Address                        Active Node    Version    Upgrade Version    Redundancy Zone    Last Echo
---------    -----------                 ---------------                        -----------    -------    ---------------    ---------------    ---------
vault-0    https://vault-0.vault-internal:8201    true           1.12.1     1.12.1             n/a                n/a

What I’m doing wrong?

It is not that well documented, but the vault operator raft join command only starts a join operation.

It won’t complete until the joining node is unsealed.

So, the next step is to unseal vault-1

Oh, …
But still, when I’m trying to unseal I’m reaciving this error

Error unsealing: Error making API request.

Code: 400. Errors:

* Vault is not initialized

But vault-0 it is and I already joined this one to vault-0 [based on your above comments that should not be yet visible]

any idea @maxb ?

Looking back at your initial post some more, I see some other issues:

You are using retry_join in your configuration file, which means there is no need for you to be running vault operator raft join at all.

However, it clearly isn’t working as intended, so you should review the log messages being printed by your Vault servers to figure out why.

Also, you ran this command:

Thinking you were telling vault-1 to join vault-0. In actual fact you logged in to vault-1 and just used vault-1’s pod as a place from which to send a join command to vault-0 without specifying what to join. That is because the -address option is not an option that configures the join process, rather it is an option that tells the vault CLI command which Vault to talk to!

So, the next step is to check you Vault logs and see why your retry_join configuration is not working.

So it seems retry_join do nothing
But if I’ll try to manually join [w/o -address attribute] it will say this error [on the leader vault-0]

http: TLS handshake error from remote error: tls: bad certificate

and this on vault-1 used to join to vault-0

 2023-02-20T08:17:37.755Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault-0.vault-internal:8200 error="error during raft bootstrap init call: Put \"https://vault-0.vault-internal:8200/v1/sys/storage/raft/bootstra │
│ p/challenge\": x509: certificate signed by unknown authority"

How can I fix this issue?

I used this: Vault ha setup and own CA - x509: certificate signed by unknown authority so I changed the certs location to /etc/ssl/certs and it is working.
Is it there a better way to solve this?

Referring to Integrated Storage - Storage Backends - Configuration | Vault | HashiCorp Developer it appears that the options you are using expect the certificate in-line in the configuration file, and there are different options, suffixed with _file which take filenames to load the certificates from. That’s probably why this configuration didn’t work.

Since you opted to join your nodes using ad-hoc CLI commands instead, you would have had to pass all the certificate information to the vault operator raft join command via its options instead.

It seems that in the end you got things to work by adding your custom CA to a path which Go uses for trusted certificates by default, therefore avoiding the need to specify it explicitly.

1 Like