Consul not selecting leader

consul not selecting a leader

Hi, I am new to consul and got into this project very recently

we are running a 5 node consul cluster in GCP and are running into situation where leader is not getting elected.

this is the version we are using
hashicorp/consul-k8s-control-plane:1.4.7
hashicorp/consul:1.18.2

from my debugging I found these 2 commands to be interesting so sharing them here, can someone please guide on how to go about recovering this cluster ?

kubectl exec -it --namespace="consul-helm"  --context "***-***-**-**" consul-server-3 -- //bin//sh -c "consul members"
Defaulted container "consul" out of: consul, locality-init (init)
Node             Address          Status  Type    Build   Protocol  DC             Partition  Segment
consul-server-0  10.0.3.134:8301  alive   server  1.18.2  2         gke-staging01  default    <all>
consul-server-1  10.0.6.65:8301   alive   server  1.18.2  2         gke-staging01  default    <all>
consul-server-2  10.0.8.138:8301  alive   server  1.18.2  2         gke-staging01  default    <all>
consul-server-3  10.0.1.137:8301  alive   server  1.18.2  2         gke-staging01  default    <all>
consul-server-4  10.0.2.119:8301  alive   server  1.18.2  2         gke-staging01  default    <all>
kubectl exec -it --namespace="consul-helm"  --context "***-***-***-***" consul-server-1 -- //bin//sh -c "consul operator raft list-peers -stale" 

Defaulted container "consul" out of: consul, locality-init (init)
Node       ID                                    Address         State     Voter  RaftProtocol  Commit Index  Trails Leader By
(unknown)  8fa34315-bf5f-989a-7a7c-d860f87ea493  10.0.3.44:8300  follower  true   unknown       1202793914    18446744072506757702 commits
(unknown)  0fae26b0-da70-0ac4-46e4-d237789ef977  10.0.8.53:8300  follower  true   unknown       1205352631    18446744072504198985 commits
(unknown)  31f1ea38-d78f-64c6-094f-1c079faac735  10.0.6.30:8300  follower  true   unknown       1199048691    18446744072510502925 commits
(unknown)  c4b5f75d-4605-1e48-4a67-c8d002650604  10.0.2.53:8300  follower  true   unknown       1204655651    18446744072504895965 commits
(unknown)  0234628a-ec35-3704-1b9e-54db695064bf  10.0.1.56:8300  follower  true   unknown       1203540480    18446744072506011136 commits

Hi @gauravsaralMs,

Welcome to the HashiCorp Forum!

You will have to do a peers.json recovery in this scenario. Have a read through the following link.

Ref: https://github.com/hashicorp/learn/blob/8ec7ec32bf1a55c6d47168402291e3f3bc1af676/content/tutorials/consul/recovery-outage.mdx#kubernetes-specific-recovery-steps