We are attempting to setup Vault in HA mode but we stuck on one issue.
We have cluster with 3 nodes (each node on a separate machine, which can communicate with each other.).
Evrything works correctly when all nodes are running, and they also switch properly when executing the vault operator step-down command.
However, when I shut down one of the servers (ef with node-3), I get errors:
Error reading the raft cluster configuration: Get "https://node-3-address:8200/v1/sys/storage/raft/configuration": dial tcp node-3-ip:8200: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
or
local node not active but active cluster node not found
I expected that in case of failure of one of the nodes, a new election would occur and the leader would switch over. Otherwise, the cluster wouldn’t make sense. Here is my configuration (I tried several options and implemented fixes that I found in other threads.):
node-1
...
storage "raft" {
path = "C:\\vault\\data"
retry_join {
leader_api_addr = "https://node-1-address:8200"
}
retry_join {
leader_api_addr = "https://node-2-address:8200"
}
retry_join {
leader_api_addr = "https://node-3-address:8200"
}
}
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "node-1-address:8201"
tls_disable = 0
tls_cert_file = "C:\\vault\\tls\\cert.der"
tls_key_file = "C:\\vault\\tls\\cert.key"
tls_min_version = "tls12"
tls_disable_client_certs = true
}
api_addr = "https://node-1-address:8200"
cluster_addr = "https://node-1-address:8201"
...
node-2 and 3
...
// like above but different api_addr and cluster_addr
api_addr = "https://node-2-address:8200"
cluster_addr = "https://node-2-address:8201"
...
I would be grateful for an answer on how to fix the configuration.