Official Helm chart for Vault fails in HA mode: Can't create raft

Hello,

I am trying to deploy Vault in Kubernetes using the official Helm chart, and following the official deployment guide. It fails and when I have a look at the logs of one of the Vault pods, I can see this:

Error initializing storage of type raft: failed to create fsm: failed to open bolt file: open /vault/data/vault.db: permission denied

Here is the content of the overriding values files that I use:

injector:
  resources:
    requests:
      memory: 256Mi
      cpu: 250m
    limits:
      memory: 256Mi
      cpu: 250m

server:
  # These Resource Limits are in line with node requirements in the Vault
  # Reference Architecture for a Small Cluster
  resources:
    requests:
      memory: 8Gi
      cpu: 2000m
    limits:
      memory: 16Gi
      cpu: 2000m

  # For HA configuration and because we need to manually init the vault, we
  # need to define custom readiness/liveness Probe settings
  readinessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
  livenessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true"
    initialDelaySeconds: 60

  # extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
  # used to include variables required for auto-unseal.
  # XXX extraEnvironmentVars:
  # XXX   VAULT_CACERT: /vault/userconfig/tls-ca/ca.crt

  # extraVolumes is a list of extra volumes to mount. These will be exposed
  # to Vault in the path `/vault/userconfig/<name>/`.
  # XXX extraVolumes:
  # XXX   - type: secret
  # XXX     name: tls-server
  # XXX   - type: secret
  # XXX     name: tls-ca
  # XXX   - type: secret
  # XXX     name: kms-creds

  # This configures the Vault Statefulset to create a PVC for audit logs.
  # See https://www.vaultproject.io/docs/audit/index.html to know more
  auditStorage:
    enabled: true

  standalone:
    enabled: false

  # Run Vault in "HA" mode.
  ha:
    enabled: true
    replicas: 5
    raft:
      enabled: true
      setNodeId: true

      config: |
        ui = true
        listener "tcp" {
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          # XXX tls_cert_file = "/vault/userconfig/tls-server/server.crt"
          # XXX tls_key_file = "/vault/userconfig/tls-server/server.key"
          # XXX tls_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
        }

        storage "raft" {
          path = "/vault/data"
            retry_join {
            leader_api_addr = "http://vault-0.vault-internal:8200"
            # XXX leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
            # XXX leader_client_cert_file = "/vault/userconfig/tls-server/server.crt"
            # XXX leader_client_key_file = "/vault/userconfig/tls-server/server.key"
          }
          retry_join {
            leader_api_addr = "http://vault-1.vault-internal:8200"
            # XXX leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
            # XXX leader_client_cert_file = "/vault/userconfig/tls-server/server.crt"
            # XXX leader_client_key_file = "/vault/userconfig/tls-server/server.key"
          }
          retry_join {
            leader_api_addr = "http://vault-2.vault-internal:8200"
            # XXX leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
            # XXX leader_client_cert_file = "/vault/userconfig/tls-server/server.crt"
            # XXX leader_client_key_file = "/vault/userconfig/tls-server/server.key"
          }
          retry_join {
            leader_api_addr = "http://vault-3.vault-internal:8200"
            # XXX leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
            # XXX leader_client_cert_file = "/vault/userconfig/tls-server/server.crt"
            # XXX leader_client_key_file = "/vault/userconfig/tls-server/server.key"
          }
          retry_join {
            leader_api_addr = "http://vault-4.vault-internal:8200"
            # XXX leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
            # XXX leader_client_cert_file = "/vault/userconfig/tls-server/server.crt"
            # XXX leader_client_key_file = "/vault/userconfig/tls-server/server.key"
          }

          autopilot {
            cleanup_dead_servers = "true"
            last_contact_threshold = "200ms"
            last_contact_failure_threshold = "10m"
            max_trailing_logs = 250000
            min_quorum = 5
            server_stabilization_time = "10s"
          }

        }

        service_registration "kubernetes" {}

# Vault UI
ui:
  enabled: true
  serviceType: LoadBalancer
  serviceNodePort: null
  externalPort: 8200

  # For Added Security, edit the below
  #loadBalancerSourceRanges:
  #   - < Your IP RANGE Ex. 10.0.0.0/16 >
  #   - < YOUR SINGLE IP Ex. 1.78.23.3/32 >

And how I deploy it:

$ helm install -n vault vault hashicorp/vault -f vault-values.yaml

Chart version is 0.19.0 and Vault is 1.9.3.

The Kubernetes cluster is minikube with 5 nodes running Kubernetes 1.23.3.

Any idea? Thanks a lot for any help!

I’m a complete newbie with Kubernetes in general but it looks like your PVC didn’t work or that it’s a read-only storage type?

Hi @aram ,

That’s a good point, I was also wondering myself where are the Vault volumes configured… I will have a deeper look into the chart.

Cheers,

Fabrice

After investigation, I think something is wrong with minikube, although I can’t really manage to tell what. I switch to kind and I don’t have this problem anymore.

1 Like

I’m also wary of k8s. I’ve been switching my pods over to nomad and it’s been much easier to learn and use and does everything I need it to with minimal effort.