Hello,
I’m trying to deploy Vault HA cluster in a minikube environment using this official tutorial.
Currently, I have 3 pods running on 3 different nodes with raft backend. vault-0 pod is unsealed, ui works. But the other two pods can’t join a cluster. It seems to me as a networking problem but I’m not yet strong in Kubernetes and can’t solve it myself. Please help.
To let the pod vault-1
join a cluster, I enter pod’s CLI and try this way:
/ $ kubectl exec -n vault -it vault-1 -- /bin/sh
/ $ vault operator raft join -address=https://vault-1.vault-internal:8200 -leader-ca-cert="$(cat /vault/userconfig/vault-ha-tls/vault.ca)" -leader-client-cert="$(cat /vault/userconfig/vault-ha-tls/vault.crt)" -leader-client-key="$(cat /vault/userconfig/vault-ha-tls/vault.key)" https://vault-0.vault-internal:8200
Error joining the node to the Raft cluster: Post "https://vault-1.vault-internal:8200/v1/sys/storage/raft/join": dial tcp: lookup vault-1.vault-internal: i/o timeout
it ends up with a timeout.
Certificates are accessible within the container, cat /vault/userconfig/vault-ha-tls/vault.ca
returns a valid certificate from the Kubernetes secret.
Here is what pods and services look like:
oko@PC:~/vault$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
vault-0 1/1 Running 0 97m 10.244.2.2 minikube-m03 <none> <none>
vault-1 0/1 Running 0 97m 10.244.1.3 minikube-m02 <none> <none>
vault-2 0/1 Running 0 97m 10.244.3.2 minikube-m04 <none> <none>
vault-agent-injector-85489c48c7-8qll2 1/1 Running 0 97m 10.244.1.2 minikube-m02 <none> <none>
oko@PC:~/vault$ kubectl get services -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
vault ClusterIP 10.99.53.157 <none> 8200/TCP,8201/TCP 2d21h app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
vault-active ClusterIP 10.109.79.36 <none> 8200/TCP,8201/TCP 2d21h app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server,vault-active=true
vault-agent-injector-svc ClusterIP 10.104.251.5 <none> 443/TCP 2d21h app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault-agent-injector,component=webhook
vault-internal ClusterIP None <none> 8200/TCP,8201/TCP 2d21h app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
vault-standby ClusterIP 10.98.128.116 <none> 8200/TCP,8201/TCP 2d21h app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server,vault-active=false
vault-ui ClusterIP 10.103.172.120 <none> 8200/TCP 2d21h app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
I can ping vault-0 from within vault-1 container using IP-address of the pod:
/ $ ping 10.244.2.2
PING 10.244.2.2 (10.244.2.2): 56 data bytes
64 bytes from 10.244.2.2: seq=0 ttl=42 time=0.555 ms
Here is my CA certificate decoded:
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
04:60:20:bf:05:97:9f:ff:27:f6:1a:10:e8:d9:e4:fd
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = minikubeCA
Validity
Not Before: Nov 7 09:16:33 2023 GMT
Not After : Feb 15 09:16:33 2024 GMT
Subject: O = system:nodes, CN = system:node:*.vault.svc.cluster.local
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:b3:26:............
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Authority Key Identifier:
07:90:F2:E8:3D:7D:63:43:0E:89:B7:2F:38:F6:9E:BC:17:A8:4A:E8
X509v3 Subject Alternative Name:
DNS:*.vault-internal, DNS:*.vault-internal.vault.svc.cluster.local, DNS:*.vault, IP Address:127.0.0.1
Signature Algorithm: sha256WithRSAEncryption
Signature Value:
00:f5:01:...................
and, finally, Helm’s overrides yaml:
global:
enabled: true
tlsDisable: false
injector:
enabled: true
image:
repository: "hashicorp/vault-k8s"
tag: "1.3"
agentImage:
repository: "hashicorp/vault"
tag: "1.15.2"
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "100m"
server:
image:
repository: "hashicorp/vault"
tag: "1.15.2"
resources:
requests:
memory: '2Gi'
cpu: "1000m"
limits:
memory: '2Gi'
cpu: '1000m'
extraEnvironmentVars:
VAULT_CACERT: /vault/userconfig/vault-ha-tls/vault.ca
VAULT_TLSCERT: /vault/userconfig/vault-ha-tls/vault.crt
VAULT_TLSKEY: /vault/userconfig/vault-ha-tls/vault.key
volumes:
- name: userconfig-vault-ha-tls
secret:
defaultMode: 420
secretName: vault-ha-tls
volumeMounts:
- mountPath: /vault/userconfig/vault-ha-tls
name: userconfig-vault-ha-tls
readOnly: true
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: {{ template "vault.name" . }}
app.kubernetes.io/instance: "{{ .Release.Name }}"
component: server
topologyKey: kubernetes.io/hostname
dataStorage:
enabled: true
size: 2Gi
storageClass: vault-local-storage
standalone:
enabled: false
ha:
enabled: true
replicas: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
tls_disable = 0
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
tls_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
tls_client_ca_file = "/vault/userconfig/vault-ha-tls/vault.ca"
}
storage "raft" {
path = "/vault/data"
}
disable_mlock = true
service_registration "kubernetes" {}
securityContext:
pod: |
runAsNonRoot: true
runAsGroup: {{ .Values.server.gid | default 999 }}
runAsUser: {{ .Values.server.uid | default 1000 }}
fsGroup: {{ .Values.server.gid | default 1000 }}
ui:
enabled: true
and unsealed vault (vault-0) status:
oko@PC:~/vault$ kubectl exec -n vault -it vault-0 -- /bin/sh
/ $ vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 2
Version 1.15.2
Build Date 2023-11-06T11:33:28Z
Storage Type raft
Cluster Name vault-cluster-781bd4a1
Cluster ID aa91d3f6-234b-dda7-1276-06f92aad916d
HA Enabled true
HA Cluster https://vault-0.vault-internal:8201
HA Mode active
Active Since 2023-11-13T15:26:28.5852121Z
Raft Committed Index 524
Raft Applied Index 524
For me, it looks like the vault container can’t resolve vault-1.vault-internal
and vault-0.vault-internal
names