Hi, I have 3 server cluster. When I block the network connection of the leader I expected that the remaining 2 server elect a new leader like they do when I terminate the leader. For some reason this does not happen. The leader is marked failed but the follow stay follower.
# on the leader
$ consul members list | grep server
ip-10-140-32-151 10.140.32.151:8301 alive server 1.6.1 2 eu-west-1 <all>
ip-10-140-37-156 10.140.37.156:8301 alive server 1.6.1 2 eu-west-1 <all>
ip-10-140-41-175 10.140.41.175:8301 alive server 1.6.1 2 eu-west-1 <all>
$ consul operator raft list-peers
Node ID Address State Voter RaftProtocol
ip-10-140-32-151 44ab630a-1258-78d3-d979-2f85b149c358 10.140.32.151:8300 leader true 3
ip-10-140-37-156 cab982af-0496-5f27-c29e-7118270c633f 10.140.37.156:8300 follower true 3
ip-10-140-41-175 25dc9732-aaac-3645-0168-a3e7b104cc3f 10.140.41.175:8300 follower true 3
# on the followers
$ consul members list | grep server
ip-10-140-32-151 10.140.32.151:8301 failed server 1.6.1 2 eu-west-1 <all>
ip-10-140-37-156 10.140.37.156:8301 alive server 1.6.1 2 eu-west-1 <all>
ip-10-140-41-175 10.140.41.175:8301 alive server 1.6.1 2 eu-west-1 <all>
$ consul operator raft list-peers
Error getting peers: Failed to retrieve raft configuration: Unexpected response code: 500 (No cluster leader)
I even tried to force a leader election by calling
$ consul force-leave ip-10-140-32-151
But the only change was the status of the leader in the members list from failed to left.
In any case restoring the network connection restores the cluster. But during the network down time the cluster is basically not in a functional state.
I’m happy for any hints.
(yes, you see correctly in the outputs above we still use version 1.6.1)