We’re encountering an odd case where a Vault leader is being replaced by a single follower that is restarted in a cluster of 3. I’m running vault 1.15.4 running on GKE and deployed by helm version 0.26.1.
- The leader was vault-0, I deleted one of the followers vault-1, and once vault-1 pod was created it became the current leader.
- I then deleted vault-2 (follower), and pod vault-0 became the leader.
- I deleted the current leader vault-0 and the new vault-1. became the leader (expected behavior)
I was expecting a real HA scenario where on steps 1 and 2 (follower deletion), the leader should continue to be the same without triggering the leader election.
Here’s my config:
listener "tcp" {
tls_disable = false
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
http_read_timeout = "600s"
tls_cert_file = "tls.crt"
tls_key_file = "tls.key"
tls_client_ca_file = "tls.ca"
}
telemetry {
prometheus_retention_time = "12h"
disable_hostname = true
enable_hostname_label = true
}
seal "gcpckms" {
project = "abc"
key_ring = "abc"
crypto_key = "abc"
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "https://vault-0:8200"
leader_ca_cert_file = "ca.crt"
leader_client_cert_file = "tls.crt"
leader_client_key_file = "tls.key"
}
retry_join {
leader_api_addr = "https://vault-1:8200"
leader_ca_cert_file = "ca.crt"
leader_client_cert_file = "tls.crt"
leader_client_key_file = "tls.key"
}
retry_join {
leader_api_addr = "https://vault-2:8200"
leader_ca_cert_file = "ca.crt"
leader_client_cert_file = "tls.crt"
leader_client_key_file = "tls.key"
}
performance_multiplier = 1
}
service_registration "kubernetes" {}