Cluster leadership instability


We are seeing frequent leadership changes in our consul cluster. The cluster consists of 5 EC2 instances in AWS spread across 3 availability zones. In addition to frequent leadership changes, the following error is observed in the follower’s logs:

an 13 07:55:21 consul1 consul[28920]:     2022-01-13T07:55:21.771Z [WARN]  agent.server: Raft has a leader but other tracking of the node would indicate that the node is unhealthy or does not exist. The network may be misconfigured.:

We’re also seeing some vote requests being sent when a leader is already selected:

Jan 13 12:22:31 c1consul1 consul[28920]:     2022-01-13T12:22:31.900Z [WARN]  agent.server.raft: rejecting vote request since we have a leader:

All ports are allowed between the consul servers, and CPU and Memory utilization appear to be within acceptable parameters. Does anyone have any insight as to why the leadership keeps changing?