3-node cluster unhealthy after leader lost network connection

Hi Harald,

It would depend on how you have blocked the network on the leader. If you only blocked the inbound traffic to the existing leader (port 8300/tcp, 8301tcp/udp & 8302tcp/udp), the node is still communicating with the cluster outbound. Which means its still part of the Serf pool and will flap between failed and alive state.

  • failed because other nodes are not able to talk to the node will result in heartbeat failure
  • alive because the node can still talk to its peers on Serf ports

You will be able to see this if you watch the consul members output from different node.

watch -d consul members

You will also see from the logs that the node is constantly getting removed and added back.

[INFO] memberlist: Suspect c2 has failed, no acks received
[INFO] memberlist: Marking c2 as failed, suspect timeout reached (0 peer confirmations)
[INFO] serf: EventMemberFailed: c2 192.168.64.46
[INFO] consul: Removing LAN server c2 (Addr: tcp/192.168.64.46:8300) (DC: dc1)
[INFO] serf: attempting reconnect to c2 192.168.64.46:8301
[INFO] memberlist: Suspect c2 has failed, no acks received
[INFO] serf: EventMemberJoin: c2 192.168.64.46
[INFO] consul: Adding LAN server c2 (Addr: tcp/192.168.64.46:8300) (DC: dc1)

In this situation, the raft still has a leader, but the follower nodes are not in a position to talk to the existing leader reported by the raft. The leader here is still able to talk to a quorum of nodes.

Testing this on 1.9.3 show’s this error message in such a scenario. (Not sure in what version this message is added)

[WARN]  agent.server: Raft has a leader but other tracking of the node would indicate that the node is unhealthy or does not exist. The network may be misconfigured.: leader=192.168.64.47:8300

force-leave is used when you want to transition a node from failed state to left state in the member list. This is helpful when your node has actually failed, but the cluster still tries to contact the node. In this case, force-leave would change the status, but when the node joins back (as part of flapping) then the state would change back to alive.

If you want to simulate exact scenario like a Leader ungraceful termination make sure that you block both inbound and outbound traffic.

iptables rules like the below would help.

# block inbound RPC
iptables -I INPUT -p tcp --dport 8300 -j DROP

# block inbound Serf LAN & WAN
iptables -I INPUT -p tcp --dport 8301 -j DROP
iptables -I INPUT -p tcp --dport 8301 -j DROP
iptables -I INPUT -p tcp --dport 8302 -j DROP
iptables -I INPUT -p udp --dport 8302 -j DROP

# block outbound RPC
iptables -I OUTPUT -p tcp --dport 8300 -j DROP

# block outbound Serf LAN & WAN
iptables -I OUTPUT -p tcp --dport 8301 -j DROP
iptables -I OUTPUT -p udp --dport 8301 -j DROP
iptables -I OUTPUT -p tcp --dport 8302 -j DROP
iptables -I OUTPUT -p udp --dport 8302 -j DROP

Doing this would result in logs like the following

[WARN]  agent.server.raft: failed to contact: server-id=d9f4ead3-ce7f-e114-ca17-6c64854aa7b5 time=2.500070665s
[WARN]  agent.server.raft: failed to contact: server-id=dc6fbf3d-ec22-f832-63ea-00293a14f1ea time=2.50019952s
[WARN]  agent.server.raft: failed to contact quorum of nodes, stepping down
[INFO]  agent.server.raft: entering follower state: follower="Node at 192.168.64.47:8300 [Follower]" leader=
[INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter d9f4ead3-ce7f-e114-ca17-6c64854aa7b5 192.168.64.48:8300}"
[INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter dc6fbf3d-ec22-f832-63ea-00293a14f1ea 192.168.64.46:8300}"
[WARN]  agent.server.coordinate: Batch update failed: error="leadership lost while committing log"
[INFO]  agent.server: cluster leadership lost

Hope this helps.

1 Like