Consul fails to ignore dead agent (powered off after power outage)

Hello,

I have several hosts each behind wireguard gateways to separate them from my main network. They are hosted on different on-premisses servers.

The dead host although keeps showing up in the logs of consul trying to connect to it, it does not show in consul members so I cannot do a force-leave.

Aug 19 01:15:10 nomad1 consul[28972]: 2022-08-19T01:15:10.150+0100 [ERROR] agent.http: Request error: method=GET url=/v1/catalog/services from=192.168.62.17:48854 error=“No cluster leader”
Aug 19 01:15:12 nomad1 consul[28972]: 2022-08-19T01:15:12.769+0100 [WARN] agent.server.raft: rejecting vote request since node is not a voter: from=192.168.62.49:8300
Aug 19 01:15:13 nomad1 consul[28972]: 2022-08-19T01:15:13.821+0100 [ERROR] agent.server.raft: failed to make requestVote RPC: target=“{Voter b5c87fae-4526-014e-2c0a-03d489ea4f3e 192.168.62.36:8300}” error=“dial tcp 192.168.62.20:0->192.168.62.36:8300: i/o timeout”
Aug 19 01:15:15 nomad1 consul[28972]: 2022-08-19T01:15:15.052+0100 [WARN] agent.server.raft: rejecting vote request since node is not a voter: from=192.168.62.2:8300
Aug 19 01:15:17 nomad1 consul[28972]: 2022-08-19T01:15:17.055+0100 [WARN] agent.server.raft: Election timeout reached, restarting election
Aug 19 01:15:17 nomad1 consul[28972]: 2022-08-19T01:15:17.055+0100 [INFO] agent.server.raft: entering candidate state: node=“Node at 192.168.62.20:8300 [Candidate]” term=1399820
Aug 19 01:15:17 nomad1 consul[28972]: 2022-08-19T01:15:17.058+0100 [WARN] agent.server.raft: unable to get address for server, using fallback address: id=b5c87fae-4526-014e-2c0a-03d489ea4f3e fallback=192.168.62.36:8300 error=“Could not find address for server id b5c87fae-4526-014e-2c0a-03d489ea4f3e”
Aug 19 01:15:18 nomad1 consul[28972]: 2022-08-19T01:15:18.611+0100 [ERROR] agent.http: Request error: method=GET url=/v1/catalog/services from=192.168.62.17:48854 error=“No cluster leader”

Above is a example of consul log, hosts are debian servers VMs Debian Buster
v1.13.1
Revision c6d0f9ec
Build Date 2022-08-11T19:07:00

All Consul agents, both server and client, participate in the Serf gossip/membership protocol, resulting in them appearing in consul members.

Confusingly, the servers participate independently in the Raft distributed consensus protocol, which has a totally separate member list viewed with consul operator raft list-peers.


Trying to do a list-peers it fails NO Cluster Leader.
It’s now that I have to specify a peers.json to correct this?

Add the -stale option to view the Raft peers even without a leader.

Do this separately on every node, as they could be out of sync if there is no leader.

If you cannot bring a quorum of existing members online to re-establish a leader normally, then yes, you will need to resort to peers.json

1 Like