Trying to install Vault as an HA cluster with raft on GKE. The idea is to not expose it to the WAN, but do expose it via cross-project networking. So, I have the default services disabled via Helm server.service.enabled = false
, and two manually deployed services (ClusterIP and a replacement for the vault-internal
local service). I also have a self-signed cert from an internal CA.
I think my most recent problems are coming from TLS issues. Last night I settled temporarily for no replication due to the vault-1 instance still failing to auto-unseal, so server.ha.replicas
here is currently set to 1. I was basically hammering at the misc. listener and cert options, so there may be some issues still. Also, the manual config section is jammed under the “raft” section because it wasn’t injecting it at all if the “config” block was placed one level up.
I’m really looking for any pointers on fundamental issues with the approach here. Namely, missing parameters allowing networking or automatic communication between replicas and leader. What I kept running into is that the vault-1 pod would not be able to automatically connect and unseal based off of the vault-0 pod, even thought vault-0 would be able to automatically boot up, read from gcpckms, and unseal (based on an existing PVC) with no intervention. vault-1 on the other hand would not automatically join the cluster (it’s unclear from the docs I read if that’s supposed to happen without intervention), and would not join on account of TLS issues if I manually issued a vault operator raft join
(last error I saw was that it was attempting a TLS connection but couldn’t use a raft protocol - I lost the exact name).
Helm values.yaml:
server:
enabled: true
mode: ha
extraEnvironmentVars:
VAULT_SEAL: gcpckms
VAULT_GCPKMS_PROJECT_ID: "MYPROJECT"
VAULT_GCPKMS_KEY_RING: "vault-helm-unseal-kr"
VAULT_GCPKMS_KEY_NAME: "vault-unseal-key"
VAULT_GCPKMS_LOCATION: "MYREGION"
VAULT_CACERT: "/etc/vault/tls/ca.crt"
ha:
enabled: true
replicas: 1
apiAddr: "https://vault-internal:8200"
clusterAddr: "https://vault-internal:8201"
service:
enabled: true
type: ClusterIP
clusterIP: None
raft:
enabled: true
setSize: 2
config: |
ui = true
storage "raft" {
path = "/vault/data"
}
log_level = "debug"
seal "gcpckms" {
project = "MYPROJECT"
region = "MYREGION"
key_ring = "MYKEYRING"
crypto_key = "vault-init"
}
# Local listener (no TLS)
listener "tcp" {
address = "127.0.0.1:8200"
# cluster_address = "127.0.0.1:8201"
tls_disable = 1
}
# Cross-project listener (with TLS)
listener "tcp" {
address = "POD_IP:8200"
cluster_address = "POD_IP:8201" # Same for cluster
tls_cert_file = "/etc/vault/tls/vault.crt"
tls_key_file = "/etc/vault/tls/vault.key"
# tls_client_ca_cert = "/etc/vault/tls/ca.crt"
tls_disable_client_certs = true
}
service_registration "kubernetes" {}
service:
enabled: false
# Define volumes to mount the TLS secret
volumes:
- name: vault-tls
secret:
secretName: vault-tls
# Mount the TLS secret into the container
volumeMounts:
- name: vault-tls
mountPath: /etc/vault/tls
readOnly: true
External networking service:
apiVersion: v1
kind: Service
metadata:
name: vault
namespace: default
labels:
app: vault
annotations:
cloud.google.com/load-balancer-type: Internal
networking.gke.io/internal-load-balancer-allow-global-access: 'true'
spec:
ports:
- name: vault-port
protocol: TCP
port: 443
targetPort: 8200
selector:
app.kubernetes.io/instance: vault
app.kubernetes.io/name: vault
clusterIP: 10.0.91.200
clusterIPs:
- 10.0.91.200
type: LoadBalancer
sessionAffinity: None
loadBalancerIP: 10.0.96.2
loadBalancerSourceRanges:
- 0.0.0.0/0
externalTrafficPolicy: Local
healthCheckNodePort: 31610
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
allocateLoadBalancerNodePorts: true
internalTrafficPolicy: Cluster
Internal networking service:
apiVersion: v1
kind: Service
metadata:
name: vault-internal
namespace: default
labels:
app: vault
component: server
spec:
ports:
- name: vault-port
protocol: TCP
port: 8200 # or whatever port your Vault pods are listening on for internal communication
targetPort: 8200
- name: vault-cluster-port
protocol: TCP
port: 8201 # or whatever port your Vault pods are listening on for internal communication
targetPort: 8201
selector:
app.kubernetes.io/instance: vault # Match this to the labels of your Vault pods
app.kubernetes.io/name: vault
component: server # Make sure this label matches the Vault server pods
clusterIP: None # This is a headless service, which will create DNS records for the pods
publishNotReadyAddresses: true
Recent logs from “vault-1” pod (to show the gist of the problem):
==> Vault server configuration:
Administrative Namespace:
Api Address: https://vault-internal:8200
Cgo: disabled
Cluster Address: https://vault-internal:8201
Environment Variables: HOME, HOSTNAME, HOST_IP, KUBERNETES_PORT, KUBERNETES_PORT_443_TCP, KUBERNETES_PORT_443_TCP_ADDR, KUBERNETES_PORT_443_TCP_PORT, KUBERNETES_PORT_443_TCP_PROTO, KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, KUBERNETES_SERVICE_PORT_HTTPS, NAME, PATH, POD_IP, PWD, SHLVL, SKIP_CHOWN, SKIP_SETCAP, VAULT_ADDR, VAULT_AGENT_INJECTOR_SVC_PORT, VAULT_AGENT_INJECTOR_SVC_PORT_443_TCP, VAULT_AGENT_INJECTOR_SVC_PORT_443_TCP_ADDR, VAULT_AGENT_INJECTOR_SVC_PORT_443_TCP_PORT, VAULT_AGENT_INJECTOR_SVC_PORT_443_TCP_PROTO, VAULT_AGENT_INJECTOR_SVC_SERVICE_HOST, VAULT_AGENT_INJECTOR_SVC_SERVICE_PORT, VAULT_AGENT_INJECTOR_SVC_SERVICE_PORT_HTTPS, VAULT_API_ADDR, VAULT_CACERT, VAULT_CLUSTER_ADDR, VAULT_GCPKMS_KEY_NAME, VAULT_GCPKMS_KEY_RING, VAULT_GCPKMS_LOCATION, VAULT_GCPKMS_PROJECT_ID, VAULT_K8S_NAMESPACE, VAULT_K8S_POD_NAME, VAULT_PORT, VAULT_PORT_443_TCP, VAULT_PORT_443_TCP_ADDR, VAULT_PORT_443_TCP_PORT, VAULT_PORT_443_TCP_PROTO, VAULT_SEAL, VAULT_SERVICE_HOST, VAULT_SERVICE_PORT, VAULT_SERVICE_PORT_VAULT_PORT, VERSION
Go Version: go1.22.8
Listener 1: tcp (addr: "127.0.0.1:8200", cluster address: "127.0.0.1:8201", disable_request_limiter: "false", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Listener 2: tcp (addr: "10.0.94.50:8200", cluster address: "10.0.94.50:8201", disable_request_limiter: "false", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
Log Level: trace
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: raft (HA available)
Version: Vault v1.18.1, built 2024-10-29T14:21:31Z
Version Sha: f479e5c85462477c9334564bc8f69531cdb03b65
==> Vault server started! Log data will stream in below:
2025-02-22T01:12:55.700Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy=""
2025-02-22T01:12:55.701Z [WARN] storage.raft.fsm: raft FSM db file has wider permissions than needed: needed=-rw------- existing=-rw-rw----
2025-02-22T01:12:55.715Z [DEBUG] storage.raft.fsm: time to open database: elapsed=14.464576ms path=/vault/data/vault.db
2025-02-22T01:12:55.732Z [DEBUG] service_registration.kubernetes: "namespace": "default"
2025-02-22T01:12:55.732Z [DEBUG] service_registration.kubernetes: "pod_name": "vault-1"
2025-02-22T01:12:56.036Z [INFO] incrementing seal generation: generation=1
2025-02-22T01:12:56.037Z [DEBUG] core: set config: sanitized config="{\"administrative_namespace_path\":\"\",\"api_addr\":\"\",\"cache_size\":0,\"cluster_addr\":\"https://vault-internal:8201\",\"cluster_cipher_suites\":\"\",\"cluster_name\":\"\",\"default_lease_ttl\":0,\"default_max_request_duration\":0,\"detect_deadlocks\":\"\",\"disable_cache\":false,\"disable_clustering\":false,\"disable_indexing\":false,\"disable_mlock\":true,\"disable_performance_standby\":false,\"disable_printable_check\":false,\"disable_sealwrap\":false,\"disable_sentinel_trace\":false,\"enable_response_header_hostname\":false,\"enable_response_header_raft_node_id\":false,\"enable_ui\":true,\"experiments\":null,\"imprecise_lease_role_tracking\":false,\"introspection_endpoint\":false,\"listeners\":[{\"config\":{\"address\":\"127.0.0.1:8200\",\"tls_disable\":1},\"type\":\"tcp\"},{\"config\":{\"address\":\"10.0.94.50:8200\",\"tls_cert_file\":\"/etc/vault/tls/vault.crt\",\"tls_disable_client_certs\":true,\"tls_key_file\":\"/etc/vault/tls/vault.key\"},\"type\":\"tcp\"}],\"log_format\":\"\",\"log_level\":\"trace\",\"log_requests_level\":\"\",\"max_lease_ttl\":0,\"pid_file\":\"\",\"plugin_directory\":\"\",\"plugin_file_permissions\":0,\"plugin_file_uid\":0,\"plugin_tmpdir\":\"\",\"raw_storage_endpoint\":false,\"seals\":[{\"disabled\":false,\"name\":\"gcpckms\",\"type\":\"gcpckms\"}],\"service_registration\":{\"type\":\"kubernetes\"},\"storage\":{\"cluster_addr\":\"\",\"disable_clustering\":false,\"raft\":{\"max_entry_size\":\"\"},\"redirect_addr\":\"\",\"type\":\"raft\"}}"
2025-02-22T01:12:56.037Z [DEBUG] storage.cache: creating LRU cache: size=0
2025-02-22T01:12:56.037Z [INFO] core: Initializing version history cache for core
2025-02-22T01:12:56.037Z [INFO] events: Starting event system
2025-02-22T01:12:56.040Z [DEBUG] cluster listener addresses synthesized: cluster_addresses=[127.0.0.1:8201, 10.0.94.50:8201]
2025-02-22T01:12:56.044Z [INFO] core: stored unseal keys supported, attempting fetch
2025-02-22T01:12:56.044Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2025-02-22T01:12:56.111Z [DEBUG] would have sent systemd notification (systemd not present): notification=READY=1
2025-02-22T01:13:01.045Z [INFO] core: stored unseal keys supported, attempting fetch
2025-02-22T01:13:01.045Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2025-02-22T01:13:04.976Z [INFO] core: security barrier not initialized
2025-02-22T01:13:04.976Z [INFO] core.autoseal: recovery seal configuration missing, but cannot check old path as core is sealed
2025-02-22T01:13:06.046Z [INFO] core: stored unseal keys supported, attempting fetch
2025-02-22T01:13:06.046Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2025-02-22T01:13:09.971Z [INFO] core: security barrier not initialized
2025-02-22T01:13:09.971Z [INFO] core.autoseal: recovery seal configuration missing, but cannot check old path as core is sealed
2025-02-22T01:13:11.046Z [INFO] core: stored unseal keys supported, attempting fetch
2025-02-22T01:13:11.047Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2025-02-22T01:13:14.974Z [INFO] core: security barrier not initialized
2025-02-22T01:13:14.974Z [INFO] core.autoseal: recovery seal configuration missing, but cannot check old path as core is sealed
2025-02-22T01:13:16.047Z [INFO] core: stored unseal keys supported, attempting fetch
2025-02-22T01:13:16.048Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2025-02-22T01:13:19.970Z [INFO] core: security barrier not initialized
2025-02-22T01:13:19.970Z [INFO] core.autoseal: recovery seal configuration missing, but cannot check old path as core is sealed
2025-02-22T01:13:21.048Z [INFO] core: stored unseal keys supported, attempting fetch
2025-02-22T01:13:21.048Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2025-02-22T01:13:24.989Z [INFO] core: security barrier not initialized
2025-02-22T01:13:24.989Z [INFO] core.autoseal: recovery seal configuration missing, but cannot check old path as core is sealed
2025-02-22T01:13:26.049Z [INFO] core: stored unseal keys supported, attempting fetch
2025-02-22T01:13:26.049Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2025-02-22T01:13:29.978Z [INFO] core: security barrier not initialized
2025-02-22T01:13:29.978Z [INFO] core.autoseal: recovery seal configuration missing, but cannot check old path as core is sealed