I have a hashicorp Vault test cluster of 3 nodes, I have shut down 2 nodes to simulate a quorum loss. These vault instances are running in Docker (podman) containers.
I have restored the remaining node with a peers.json file.
Now I have started the 2 other nodes again, and I want to join them to the cluster with this peers.json:
[
{
“id”: “vlt902”,
“address”: “ip-of-server:8201”
},
{
“id”: “vlt903”,
“address”: “ip-of-server:8201”
}
]
When I use the FQDN, I get the error: “too many colons in address”, I use the IP address instead.
But when I restart the container, I get the following error:
==> Vault server configuration:
Administrative Namespace:
Api Address: FQDN:8200
Cgo: disabled
Cluster Address: FQDN:8201
Environment Variables: HOME, HOSTNAME, NAME, PATH, VAULT_ADDR, VAULT_CACERT, VERSION, container
Go Version: go1.23.6
Listener 1: tcp (addr: “0.0.0.0:8200”, cluster address: “0.0.0.0:8201”, disable_request_limiter: “false”, max_request_duration: “1m30s”, max_request_size: “33554432”, tls: “enabled”)
Log Level: info
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: raft (HA available)
Version: Vault v1.19.0, built 2025-03-04T12:36:40Z
Version Sha: 7eeafb6160d60ede73c1d95566b0c8ea54f3cb5a
==> Vault server started! Log data will stream in below:
2025-04-01T12:58:20.402Z [INFO] proxy environment: http_proxy=“” https_proxy=“” no_proxy=“”
2025-04-01T12:58:20.402Z [WARN] storage.raft.fsm: raft FSM db file has wider permissions than needed: needed=-rw------- existing=-rwxrwxrwx
2025-04-01T12:58:20.405Z [INFO] incrementing seal generation: generation=1
2025-04-01T12:58:20.405Z [INFO] core: Initializing version history cache for core
2025-04-01T12:58:20.405Z [INFO] events: Starting event system
2025-04-01T12:58:20.407Z [INFO] core: raft retry join initiated
2025-04-01T12:58:42.624Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201
2025-04-01T12:58:42.625Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
2025-04-01T12:58:42.625Z [INFO] storage.raft: raft recovery initiated: recovery_file=peers.json
2025-04-01T12:58:42.625Z [INFO] storage.raft: raft recovery found new config: config=“{[{Voter vlt902 10.45.121.83:8201} {Voter vlt903 10.45.121.84:8201}]}”
2025-04-01T12:58:42.626Z [INFO] storage.raft: snapshot restore progress: id=bolt-snapshot last-index=254027 last-term=312 size-in-bytes=0 read-bytes=0 percent-complete=“NaN%”
2025-04-01T12:58:42.628Z [INFO] storage.raft: raft recovery deleted peers.json
2025-04-01T12:58:42.628Z [INFO] storage.raft: creating Raft: config=“&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:"vlt902", NotifyCh:(chan<- bool)(0xc003592080), LogOutput:io.Writer(nil), LogLevel:"DEBUG", Logger:(*hclog.interceptLogger)(0xc003373380), NoSnapshotRestoreOnStart:true, PreVoteDisabled:false, skipStartup:false}”
2025-04-01T12:58:42.629Z [INFO] storage.raft: initial configuration: index=1 servers=“[{Suffrage:Voter ID:vlt902 Address:10.45.121.83:8201} {Suffrage:Voter ID:vlt903 Address:10.45.121.84:8201}]”
2025-04-01T12:58:42.629Z [INFO] core: vault is unsealed
2025-04-01T12:58:42.629Z [INFO] storage.raft: entering follower state: follower=“Node at FQDB:8201 [Follower]” leader-address= leader-id=
2025-04-01T12:58:42.629Z [INFO] core: entering standby mode
2025-04-01T12:58:57.658Z [WARN] storage.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2025-04-01T12:58:57.659Z [INFO] storage.raft: entering candidate state: node=“Node at FQDN:8201 [Candidate]” term=315
2025-04-01T12:58:57.845Z [ERROR] storage.raft: failed to make requestVote RPC: target=“{Voter vlt903 10.45.121.84:8201}” error=“read tcp 10.89.0.16:59564->10.45.121.84:8201: read: connection reset by peer” term=315
2025-04-01T12:59:07.190Z [WARN] storage.raft: Election timeout reached, restarting election
2025-04-01T12:59:07.190Z [INFO] storage.raft: entering candidate state: node=“Node at FQDN:8201 [Candidate]” term=315
2025-04-01T12:59:07.374Z [ERROR] storage.raft: failed to make requestVote RPC: target=“{Voter vlt903 10.45.121.84:8201}” error=“read tcp 10.89.0.16:43784->10.45.121.84:8201: read: connection reset by peer” term=315
The ports are open in firewall-cmd, what am I missing here?
public (active)
target: default
icmp-block-inversion: no
interfaces: ens33
sources:
services: cockpit dhcpv6-client ssh
ports: 8200/tcp 8201/tcp
protocols:
forward: yes
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules: