2-node vault cluster APIs unresponsive after killing follower node

I have a 2-node vault cluster that is working properly

root@e21fea7c1f99:# vault operator raft list-peers
Node                                    Address             State       Voter
----                                    -------             -----       -----
cf0dc62e-9193-bb64-19a4-5643b6f19517    172.16.0.38:8201    leader      true
8219a3b2-f140-46f4-992d-5a4cf0acf791    172.16.0.43:8201    follower    true

Once I kill the follower peer the leader peer steps down

2023-07-25T22:56:02.045Z [ERROR] storage.raft: failed to heartbeat to: peer=172.16.0.43:8201 backoff time=10ms error="dial tcp 172.16.0.43:8201: connect: connection refused"
2023-07-25T22:56:02.285Z [DEBUG] core.cluster-listener: creating rpc dialer: address=172.16.0.43:8201 alpn=raft_storage_v1 host=raft-82a6ee8c-995f-f593-02a9-7834fdd9478c
2023-07-25T22:56:02.286Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 8219a3b2-f140-46f4-992d-5a4cf0acf791 172.16.0.43:8201}" error="dial tcp 172.16.0.43:8201: connect: co
nnection refused"
2023-07-25T22:56:02.627Z [DEBUG] core.cluster-listener: creating rpc dialer: address=172.16.0.43:8201 alpn=raft_storage_v1 host=raft-82a6ee8c-995f-f593-02a9-7834fdd9478c
2023-07-25T22:56:02.627Z [ERROR] storage.raft: failed to heartbeat to: peer=172.16.0.43:8201 backoff time=20ms error="dial tcp 172.16.0.43:8201: connect: connection refused"
2023-07-25T22:56:02.864Z [WARN]  storage.raft: failed to contact: server-id=8219a3b2-f140-46f4-992d-5a4cf0acf791 time=2.500850963s
2023-07-25T22:56:02.864Z [WARN]  storage.raft: failed to contact quorum of nodes, stepping down
2023-07-25T22:56:02.864Z [INFO]  storage.raft: entering follower state: follower="Node at 172.16.0.38:8201 [Follower]" leader-address= leader-id=
2023-07-25T22:56:02.865Z [WARN]  core: leadership lost, stopping active operation
2023-07-25T22:56:02.865Z [INFO]  core: pre-seal teardown starting
2023-07-25T22:56:02.865Z [DEBUG] storage.raft.autopilot: state update routine is now stopped
2023-07-25T22:56:02.865Z [DEBUG] storage.raft.autopilot: autopilot is now stopped
2023-07-25T22:56:03.365Z [INFO]  core: stopping raft active node
2023-07-25T22:56:03.365Z [DEBUG] expiration: stop triggered
2023-07-25T22:56:03.365Z [TRACE] expiration.job-manager: terminating job manager...
2023-07-25T22:56:03.365Z [TRACE] expiration.job-manager: terminating dispatcher
2023-07-25T22:56:03.365Z [DEBUG] expiration: finished stopping
2023-07-25T22:56:03.366Z [INFO]  rollback: stopping rollback manager
2023-07-25T22:56:03.366Z [INFO]  core: pre-seal teardown complete
2023-07-25T22:56:03.366Z [ERROR] core: clearing leader advertisement failed: error="node is not the leader"
2023-07-25T22:56:03.366Z [ERROR] core: unlocking HA lock failed: error="node is not the leader"
2023-07-25T22:56:03.366Z [TRACE] core: found new active node information, refreshing
2023-07-25T22:56:03.404Z [DEBUG] core.cluster-listener: creating rpc dialer: address=172.16.0.43:8201 alpn=raft_storage_v1 host=raft-82a6ee8c-995f-f593-02a9-7834fdd9478c
2023-07-25T22:56:03.405Z [ERROR] storage.raft: failed to heartbeat to: peer=172.16.0.43:8201 backoff time=40ms error="dial tcp 172.16.0.43:8201: connect: connection refused"
2023-07-25T22:56:03.647Z [DEBUG] core.cluster-listener: creating rpc dialer: address=172.16.0.43:8201 alpn=raft_storage_v1 host=raft-82a6ee8c-995f-f593-02a9-7834fdd9478c
2023-07-25T22:56:03.648Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 8219a3b2-f140-46f4-992d-5a4cf0acf791 172.16.0.43:8201}" error="dial tcp 172.16.0.43:8201: connect: connection refused"

all cluster APIs are stuck on the leader

root@e21fea7c1f99:# vault operator members
Error making API request.

URL: GET http://172.16.0.38:8200/v1/sys/ha-status
Code: 500. Errors:

* local node not active but active cluster node not found
root@e21fea7c1f99:# vault operator raft list-peers
Error reading the raft cluster configuration: Error making API request.

URL: GET http://172.16.0.38:8200/v1/sys/storage/raft/configuration
Code: 500. Errors:

* local node not active but active cluster node not found

the log has the following messages:

2023-07-25T22:59:20.573Z [TRACE] core: found new active node information, refreshing
2023-07-25T22:59:23.073Z [TRACE] core: found new active node information, refreshing
2023-07-25T22:59:25.574Z [TRACE] core: found new active node information, refreshing
2023-07-25T22:59:26.788Z [WARN]  storage.raft: Election timeout reached, restarting election
2023-07-25T22:59:26.788Z [INFO]  storage.raft: entering candidate state: node="Node at 172.16.0.38:8201 [Candidate]" term=325
2023-07-25T22:59:26.791Z [DEBUG] storage.raft: voting for self: term=325 id=cf0dc62e-9193-bb64-19a4-5643b6f19517
2023-07-25T22:59:26.795Z [DEBUG] storage.raft: asking for vote: term=325 from=8219a3b2-f140-46f4-992d-5a4cf0acf791 address=172.16.0.43:8201
2023-07-25T22:59:26.795Z [DEBUG] storage.raft: calculated votes needed: needed=2 term=325
2023-07-25T22:59:26.795Z [DEBUG] storage.raft: vote granted: from=cf0dc62e-9193-bb64-19a4-5643b6f19517 term=325 tally=1
2023-07-25T22:59:26.795Z [DEBUG] core.cluster-listener: creating rpc dialer: address=172.16.0.43:8201 alpn=raft_storage_v1 host=raft-82a6ee8c-995f-f593-02a9-7834fdd9478c
2023-07-25T22:59:26.795Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter 8219a3b2-f140-46f4-992d-5a4cf0acf791 172.16.0.43:8201}" error="dial tcp 172.16.0.43:8201: connect: connection refused" term=325
2023-07-25T22:59:28.074Z [TRACE] core: found new active node information, refreshing

I don’t understand:

  1. Why there is a need for a new election when the follower died?
  2. Why leader cluster APIs are not working anymore?

I have looked into many previous issues but none matched my issue. Appreciate your help.

There are no retry_join stanzas in the configuration. I have setup the cluster using vault operator raft join

the current vault status is as follows:

root@e21fea7c1f99:~# vault status
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            5
Threshold               3
Version                 1.14.1
Build Date              2023-07-21T10:15:14Z
Storage Type            raft
Cluster Name            vault-cluster-5478de37
Cluster ID              29c88e0b-5d9d-3a46-5458-31662b1b2aae
HA Enabled              true
HA Cluster              n/a
HA Mode                 standby
Active Node Address     <none>
Raft Committed Index    71
Raft Applied Index      71

I cannot even remove the killed follower

root@e21fea7c1f99:~# vault operator raft remove-peer 8219a3b2-f140-46f4-992d-5a4cf0acf791
Error removing the peer from raft cluster: Error making API request.

URL: PUT http://172.16.0.38:8200/v1/sys/storage/raft/remove-peer
Code: 500. Errors:

* local node not active but active cluster node not found

other new nodes are also unable to join the cluster in this state because of the same error

root@e2ceae16c340:~# vault operator raft join http://172.16.0.38:8200
Error joining the node to the Raft cluster: Error making API request.

URL: POST http://127.0.0.1:8200/v1/sys/storage/raft/join
Code: 500. Errors:

* failed to join raft cluster: failed to get raft challenge


2023-07-25T23:22:29.610Z [INFO]  core: attempting to join possible raft leader node: leader_addr=http://172.16.0.38:8200
2023-07-25T23:22:29.617Z [ERROR] core: failed to get raft challenge: leader_addr=http://172.16.0.38:8200
  error=
  | error during raft bootstrap init call: Error making API request.
  |
  | URL: PUT http://172.16.0.38:8200/v1/sys/storage/raft/bootstrap/challenge
  | Code: 500. Errors:
  |
  | * local node not active but active cluster node not found

2023-07-25T23:22:29.617Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"

Raft is a consensus/quorum system.

It is intrinsically required that a simple numerical majority of nodes - i.e. more than half - are online and can communicate to elect a leader.

A cluster of 2 is largely pointless, as 2 out of 2 nodes must be online for the cluster to be operational.

Practical cluster sizes are 3 (2/3 nodes = more than half) or 5 (3/5 nodes = more than half).

A Raft cluster that has lost quorum and cannot regain it can be recovered, but only via a rather obscure administrative override method that involves shutting down the servers, and starting them up with handcrafted JSON override files in place - Vault Cluster Lost Quorum Recovery | Vault | HashiCorp Developer.