Cluster leadership instability

jeffrey.mintz · January 14, 2022, 1:40am

Hi,

We are seeing frequent leadership changes in our consul cluster. The cluster consists of 5 EC2 instances in AWS spread across 3 availability zones. In addition to frequent leadership changes, the following error is observed in the follower’s logs:

an 13 07:55:21 consul1 consul[28920]:     2022-01-13T07:55:21.771Z [WARN]  agent.server: Raft has a leader but other tracking of the node would indicate that the node is unhealthy or does not exist. The network may be misconfigured.: leader=xxx.xxx.xxx.xxx:8300

We’re also seeing some vote requests being sent when a leader is already selected:

Jan 13 12:22:31 c1consul1 consul[28920]:     2022-01-13T12:22:31.900Z [WARN]  agent.server.raft: rejecting vote request since we have a leader: from=xxx.xxx.xxx.xxx:8300 leader=xxx.xxx.xxx.xxx:8300

All ports are allowed between the consul servers, and CPU and Memory utilization appear to be within acceptable parameters. Does anyone have any insight as to why the leadership keeps changing?

Topic		Replies	Views
Error: Consul cluster not able to elect a leader Consul consul	2	1217	April 5, 2022
3-node cluster unhealthy after leader lost network connection Consul	3	3883	March 4, 2021
Failed leadership election with three node cluster in GKE (Consul v1.5.2) Consul	4	363	February 20, 2023
Consul failing to commit leader election results Consul	9	1690	November 22, 2022
Losing heartbeat and re-election leader Consul	0	334	April 24, 2023

Cluster leadership instability

Related topics