Unable to unseal all nodes in new Raft cluster in Kubernetes

I am standing up a new Raft based Vault with 3 replicas. Installation is using Helm and my Helm chart looks like this:

spec:
  releaseName: vault #pulled from base
  chart:
    spec:
      chart: vault #pulled from base
  values:
    global:
      enabled: true
    server:
      extraLabels:
        app: vault
        version: 1.8.0
      dev:
        enabled: false
      dataStorage:
        enabled: true
        storageClass: px-no-delete
      ui:
        enabled: true
      ha:
        enabled: true
        replicas: 3
        raft:
          enabled: true

The pods come up and I am able to successfully init and unseal vault-0. It even gives a healthy looking (to me at least) vault status.

Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            5
Threshold               3
Version                 1.8.2
Storage Type            raft
Cluster Name            vault-cluster-22cf63ee
Cluster ID              7bf1edc4-bb0f-4d62-8821-f744afe4e550
HA Enabled              true
HA Cluster              n/a
HA Mode                 standby
Active Node Address     <none>
Raft Committed Index    31
Raft Applied Index      31

However, when I execute unseal commands against vault-1 or vault-2 I am receiving an error:

~ % kubectl exec --stdin=true --tty=true vault-1 -- vault operator unseal
Error unsealing: Error making API request.

URL: PUT http://127.0.0.1:8200/v1/sys/unseal
Code: 400. Errors:

* Vault is not initialized

Why is the Vault not initialized on vault-1 but it is vault-0. Feels like it’s an issue with the Raft communication but looking for suggestions. Thank you.

I’m not good on kubernete, but I think the problem is that you’re not using “joins” so I imagine vault nodes are all coming up as independent nodes.

See: Vault on Kubernetes Deployment Guide | Vault - HashiCorp Learn

That’s what I’m wondering. I’m not injecting certificate files so that may be also some of the issue but I’m seeing the same problem with this configuration.

    storage "raft" {
      path = "/vault/data"
      retry_join {
      leader_api_addr = "https://vault-0.vault-internal:8200"
      }
      retry_join {
      leader_api_addr = "https://vault-1.vault-internal:8200"
      }
      retry_join {
      leader_api_addr = "https://vault-2.vault-internal:8200"
      }

Try posting your full values.yaml file, it’s probably something else. TLS is not required as long as you disabled it.

    global:
      enabled: true
    server:
      updateStrategyType: RollingUpdate
      extraLabels:
        app: vault
        version: 1.8.0
      dev:
        enabled: false
      dataStorage:
        enabled: true
        storageClass: px-no-delete
      ui:
        enabled: true
      ha:
        enabled: true
        replicas: 3
        raft:
          enabled: true
          config: |
            ui = true
            listener "tcp" {
              tls_disable = 1
              address = "[::]:8200"
              cluster_address = "[::]:8201"
            }
            storage "raft" {
              path = "/vault/data"
              retry_join {
              leader_api_addr = "https://vault-0.vault-internal:8200"
              }
              retry_join {
              leader_api_addr = "https://vault-1.vault-internal:8200"
              }
              retry_join {
              leader_api_addr = "https://vault-2.vault-internal:8200"
              }
            }
            service_registration "kubernetes" {}
      ingress:
        enabled: true
        annotations:
          kubernetes.io/ingress.class: nginx
        hosts:
          - host: vault.domain.com

I am hoping someone else with kubernetes knowledge could chime in on your config.

In the mean while, Can you run login to one of the pods and run:
vault operator raft list-peers
That should tell you if the nodes are part of one raft setup or they don’t see each other. The output should be 3 nodes, one leader and 2 standbys.

I was able to get this working by making sure the join URLs were set properly. They defaulted to https and I moved them to http because TLS is disabled. Also, I made sure no env vars were set. I have other problems coming up now but the cluster came up. Thank you!!!