Vault pods can't join a cluster

Hello,

I’m trying to deploy Vault HA cluster in a minikube environment using this official tutorial.

Currently, I have 3 pods running on 3 different nodes with raft backend. vault-0 pod is unsealed, ui works. But the other two pods can’t join a cluster. It seems to me as a networking problem but I’m not yet strong in Kubernetes and can’t solve it myself. Please help.

To let the pod vault-1 join a cluster, I enter pod’s CLI and try this way:

/ $ kubectl exec -n vault -it vault-1 -- /bin/sh
/ $ vault operator raft join -address=https://vault-1.vault-internal:8200 -leader-ca-cert="$(cat /vault/userconfig/vault-ha-tls/vault.ca)" -leader-client-cert="$(cat /vault/userconfig/vault-ha-tls/vault.crt)" -leader-client-key="$(cat /vault/userconfig/vault-ha-tls/vault.key)" https://vault-0.vault-internal:8200
Error joining the node to the Raft cluster: Post "https://vault-1.vault-internal:8200/v1/sys/storage/raft/join": dial tcp: lookup vault-1.vault-internal: i/o timeout

it ends up with a timeout.
Certificates are accessible within the container, cat /vault/userconfig/vault-ha-tls/vault.ca returns a valid certificate from the Kubernetes secret.

Here is what pods and services look like:

oko@PC:~/vault$ kubectl get pod -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
vault-0                                 1/1     Running   0          97m   10.244.2.2   minikube-m03   <none>           <none>
vault-1                                 0/1     Running   0          97m   10.244.1.3   minikube-m02   <none>           <none>
vault-2                                 0/1     Running   0          97m   10.244.3.2   minikube-m04   <none>           <none>
vault-agent-injector-85489c48c7-8qll2   1/1     Running   0          97m   10.244.1.2   minikube-m02   <none>           <none>
oko@PC:~/vault$ kubectl get services -o wide
NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE     SELECTOR
vault                      ClusterIP   10.99.53.157     <none>        8200/TCP,8201/TCP   2d21h   app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
vault-active               ClusterIP   10.109.79.36     <none>        8200/TCP,8201/TCP   2d21h   app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server,vault-active=true
vault-agent-injector-svc   ClusterIP   10.104.251.5     <none>        443/TCP             2d21h   app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault-agent-injector,component=webhook
vault-internal             ClusterIP   None             <none>        8200/TCP,8201/TCP   2d21h   app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
vault-standby              ClusterIP   10.98.128.116    <none>        8200/TCP,8201/TCP   2d21h   app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server,vault-active=false
vault-ui                   ClusterIP   10.103.172.120   <none>        8200/TCP            2d21h   app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server

I can ping vault-0 from within vault-1 container using IP-address of the pod:

/ $ ping 10.244.2.2
PING 10.244.2.2 (10.244.2.2): 56 data bytes
64 bytes from 10.244.2.2: seq=0 ttl=42 time=0.555 ms

Here is my CA certificate decoded:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            04:60:20:bf:05:97:9f:ff:27:f6:1a:10:e8:d9:e4:fd
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN = minikubeCA
        Validity
            Not Before: Nov  7 09:16:33 2023 GMT
            Not After : Feb 15 09:16:33 2024 GMT
        Subject: O = system:nodes, CN = system:node:*.vault.svc.cluster.local
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:b3:26:............
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Authority Key Identifier: 
                07:90:F2:E8:3D:7D:63:43:0E:89:B7:2F:38:F6:9E:BC:17:A8:4A:E8
            X509v3 Subject Alternative Name: 
                DNS:*.vault-internal, DNS:*.vault-internal.vault.svc.cluster.local, DNS:*.vault, IP Address:127.0.0.1
    Signature Algorithm: sha256WithRSAEncryption
    Signature Value:
        00:f5:01:...................

and, finally, Helm’s overrides yaml:

global:
  enabled: true
  tlsDisable: false
injector:
  enabled: true
  image:
    repository: "hashicorp/vault-k8s"
    tag: "1.3"
  agentImage:
    repository: "hashicorp/vault"
    tag: "1.15.2"
  resources:
    requests:
      memory: "128Mi"
      cpu: "100m"
    limits:
      memory: "128Mi"
      cpu: "100m"
server:
  image: 
    repository: "hashicorp/vault"
    tag: "1.15.2"
  resources:
    requests:
      memory: '2Gi'
      cpu: "1000m"
    limits:
      memory: '2Gi'
      cpu: '1000m'
  extraEnvironmentVars:
    VAULT_CACERT: /vault/userconfig/vault-ha-tls/vault.ca
    VAULT_TLSCERT: /vault/userconfig/vault-ha-tls/vault.crt
    VAULT_TLSKEY: /vault/userconfig/vault-ha-tls/vault.key
  volumes:
    - name: userconfig-vault-ha-tls
      secret:
        defaultMode: 420
        secretName: vault-ha-tls
  volumeMounts:
    - mountPath: /vault/userconfig/vault-ha-tls
      name: userconfig-vault-ha-tls
      readOnly: true
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/name: {{ template "vault.name" . }}
              app.kubernetes.io/instance: "{{ .Release.Name }}"
              component: server
          topologyKey: kubernetes.io/hostname
  dataStorage:
    enabled: true
    size: 2Gi
    storageClass: vault-local-storage
  standalone:
    enabled: false
  ha:
    enabled: true
    replicas: 3
    raft:
      enabled: true
      setNodeId: true
      config: |
        ui = true
        listener "tcp" {
          tls_disable = 0
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
          tls_key_file  = "/vault/userconfig/vault-ha-tls/vault.key"
          tls_client_ca_file = "/vault/userconfig/vault-ha-tls/vault.ca"
        }
        storage "raft" {
          path = "/vault/data"
        }
        disable_mlock = true
        service_registration "kubernetes" {}
  securityContext:
    pod: |
      runAsNonRoot: true
      runAsGroup: {{ .Values.server.gid | default 999 }}
      runAsUser: {{ .Values.server.uid | default 1000 }}
      fsGroup: {{ .Values.server.gid | default 1000 }}
ui:
  enabled: true

and unsealed vault (vault-0) status:

oko@PC:~/vault$ kubectl exec -n vault -it vault-0 -- /bin/sh
/ $ vault status
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            5
Threshold               2
Version                 1.15.2
Build Date              2023-11-06T11:33:28Z
Storage Type            raft
Cluster Name            vault-cluster-781bd4a1
Cluster ID              aa91d3f6-234b-dda7-1276-06f92aad916d
HA Enabled              true
HA Cluster              https://vault-0.vault-internal:8201
HA Mode                 active
Active Since            2023-11-13T15:26:28.5852121Z
Raft Committed Index    524
Raft Applied Index      524

For me, it looks like the vault container can’t resolve vault-1.vault-internal and vault-0.vault-internal names

Solved.

DNS resolving was broken in minikube.

For followers:

to check if this is same problem:
get pods IP-addresses (see column IP):

kubectl get pod -o wide

enter the pod:

kubectl exec -n vault -it vault-1 -- /bin/sh

first, try to ping IP address, then DNS name:

ping 10.244.1.4
ping vault-0.vault-internal

if ping by IP works but doesn’t work with dns name, then see possible solving:


First, follow this to debug.

If this is a problem with /etc/resolv.conf, then:

in wsl distro run:
cd /etc/systemd sudo nano resolved.conf
edit line #DNS=:
DNS=8.8.8.8 8.8.4.4 192.168.49.2 2001:4860:4860::8888 2001:4860:4860::8844
where 192.168.49.2 is minikube ip (run minikube ip to get your’s).
save changes and run
sudo systemctl restart systemd-resolved
then in minikube delete coredns-xxxxx pod in kube-system namespace (for example coredns-5d78c9869d-89rdt).
coredns service will start a new pod, then check if DNS works


after this fix, my nodes were able to join leader (after entering appropriate commands to join)