Error: Consul cluster not able to elect a leader

I have deployed a consul cluster with 3 nodes. i have setup --bootstrap-expect to 3. After deployment leader is elected and things works fine. However if i do a new deployment with some changes, using Terraform, aws ecs task. Following things are happening.

  1. 3 new tasks are getting created. These 3 new nodes will join the cluster.
  2. The old 3 nodes will take some time to get stopped. Till then i can see there are 6 ips in the peers API.
  3. Once the previous 3 nodes are stopped including the leader the election will start but consul cluster is not able to elect a leader.

I can see the following logs might be useful.
lost leadership because received a requestVote with a newer term

Also i can see the term value is diff in the 3 nodes… how it went out of sync?

The election is starting as soon as the old leader is stopped. but for some reason the leader is not able to elect.

How to fix this? is there something wrong in deployment? what is the best practice to remove the old cluster and bring back the new cluster with leader?

Thanks.

To get a feel for the problem: Can you trigger a force-leave on the old nodes before they are stopped?

Ultimately, I see a cluster with an even number of server nodes, since the old nodes are still valid voting members until the AutoPilot cleans up. If you get the cluster to completely forget about the old nodes and the node count is odd again, it might work.

No guarantee for anything. :smiley:

1 Like

thanks @Wolfsrudel on new node deployment gracefully exiting the old node servers seems to be working.

either consul leave or setting leave_on_terminate = true configuration works.