No cluster leader (5 node cluster, how to recover?)

I have a cluster of 5 nodes, it was running for days with “No cluster leader” but I didn’t notice until I restarted an agent and got errors like:

 rpcinsecure error making call: No cluster leader

I stopped all nodes, and started only one with:

consul agent -bootstrap-expect=1  -config-dir=/etc/consul.d/

But still couldn’t make the node become leader, so I ended up bootstrapping the whole cluster from scratch (losing all data).

How could this be fixed or what could be the best way to mark a node as a leader?

I am using the latest stable 1.13.2 but also notice while trying to follow the recovery guide that in the raft directory I don’t have the file raft.json instead I have a raft.db

Just in case this is the configuration I have in all the nodes:

server =  true
bind_addr = "<VM public IP>"
client_addr = "0.0.0.0"
data_dir = "/opt/consul"
datacenter = "test"
domain = "consul"
encrypt = "<the encrypt key>"
node_name = "consul-01"

acl {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
  tokens {
    agent = "<the agent token>"
  }
}

tls {
  defaults {
    verify_incoming = true
    verify_outgoing = true
    ca_file = "/etc/consul.d/ssl/consul-ca.crt"
    cert_file = "/etc/consul.d/ssl/consul.crt"
    key_file = "/etc/consul.d/ssl/consul.key"
  }

  internal_rpc {
    verify_server_hostname = true
  }
}

auto_encrypt {
  allow_tls = true
}

dns_config {
  enable_truncate = true
  only_passing =  true
  soa {
    min_ttl = 10
  }
  service_ttl {
    "*"= "10s"
  }
}

bootstrap_expect = 3
retry_join =  ["list of ip's"]

disable_remote_exec = true

performance {
  raft_multiplier = 1
}

ui_config {
  enabled=  true
}

log_level =  "WARN"
log_file  = "/var/log/consul/"

The bootstrap options are solely for creating new clusters. They are not for use on existing clusters, even in recovery situations, and will either do nothing or break things further.

Without logs, and the output of consul operator raft list-peers -stale executed against the API of each server node, there’s no data to know what happened in this case, so all I can say is generalities.

There is no way to “mark” a node as leader, leadership always comes from an election.

To understand why an election is not completing when you think it should, the first thing to check is what is each node’s view of the peer set (the electorate), which is done by consul operator raft list-peers -stale - the -stale allows for getting a response from non-leader nodes.

Once you have confirmed that all nodes do actually agree which nodes are part of the cluster, you can evaluate whether there’s a way to get enough of these nodes talking to each other to generate a quorum.

If there isn’t, that’s when the recovery method involving using a peers.json file to manually replace the peer set with a user-specified configuration comes in.

I think you must have misread, there’s no mention of raft.json.

Hi, I ran:

consul operator raft list-peers 

But I was getting ACL errors (I was missing the operator = read)

I misread sorry I have instead peers.info not peers.json

Any idea of how to test this scenario? I created a 3-node cluster and by just restarting the nodes they automatically find a leader, in the 5 node cluster, all nodes were responding also DNS queries but all the logs show “No cluster lead” apart multiple lines with something like:

[WARN]  agent.server.raft: rejecting vote request since node is not a voter: from=X.X.X.X:8300

Yes, you will need a suitable ACL token

I am checking the logs and I see entries like:

[WARN]  agent.server.raft: rejecting vote request since node is not a voter: from=65.107.150.86:8300
[WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
[WARN]  agent.server.raft: rejecting vote request since node is not a voter: from=65.108.150.86:8300
[WARN]  agent.server.raft: appendEntries rejected, sending older logs: peer="{Voter fdb3fbba-4eee-848d-40a9-b0ffb19a0106 167.234.225.208:8300}" next=1128909
[WARN]  agent.server.raft: appendEntries rejected, sending older logs: peer="{Voter 57dc3d3f-7faf-fd9b-06b8-a10a1738c9c5 167.234.218.211:8300}" next=1128909
[ERROR] agent.server.raft: peer has newer term, stopping replication: peer="{Nonvoter e9db2aa0-51cc-4545-71c8-a6d707f65597 65.107.150.86:8300}"
[ERROR] agent.server: failed to wait for barrier: error="leadership lost while committing log"

You may be experiencing problems similar to this other active topic:

But what we really need to see to say for certain is consul operator raft list-peers -stale output from each node, to understand what state the cluster is in. That will requiring finding a suitably permissioned ACL token that already exists on the cluster.

The cluster currently running fine (I had to restore it) but if happens again I will run this

I manage to simulate this scenario blocking traffic from the current leader likes described here: 3-node cluster unhealthy after leader lost network connection - #2 by Ranjandas, using:

# block inbound RPC
iptables -I INPUT -p tcp --dport 8300 -j DROP

# block inbound Serf LAN & WAN
iptables -I INPUT -p tcp --dport 8301 -j DROP
iptables -I INPUT -p tcp --dport 8301 -j DROP
iptables -I INPUT -p tcp --dport 8302 -j DROP
iptables -I INPUT -p udp --dport 8302 -j DROP

# block outbound RPC
iptables -I OUTPUT -p tcp --dport 8300 -j DROP

# block outbound Serf LAN & WAN
iptables -I OUTPUT -p tcp --dport 8301 -j DROP
iptables -I OUTPUT -p udp --dport 8301 -j DROP
iptables -I OUTPUT -p tcp --dport 8302 -j DROP
iptables -I OUTPUT -p udp --dport 8302 -j DROP

This is a test cluster with no ACL and when I run consul operator raft list-peers, I get this error message:

Error getting peers: Failed to retrieve raft configuration: Unexpected response code: 403 (rpc error making call: ACL not found)

How could I recover the cluster or promote a new leader?

This is the output of consul operator raft list-peers -stale

Node         ID                                    Address              State     Voter  RaftProtocol
eu-consul-2  b1c0dcc1-ed95-9915-702e-86c85adec93e  188.34.185.115:8300  follower  true   3
us-consul-2  40f7426b-96ff-daea-8b82-fc4e96615fd3  5.161.151.80:8300    leader    true   3
eu-consul-3  befc558f-16c9-3a58-768b-449730eeac24  49.12.7.233:8300     follower  true   3
eu-consul-1  6ce38127-5bfa-744b-8ca7-88baf1c5cc23  78.46.187.173:8300   follower  true   3

Hi @nbari,

Did you find out what caused the No cluster leader? Regarding recovery in these scenarios, you should follow the peers.json recovery method (linked below).

Why peers.json?

  1. Out of 5 nodes, you stopped all 4 nodes when you didn’t have a leader in the cluster.
  2. Because of the above, the 1 node you are trying to recover still has all the other 4 nodes in its raft pool and will continuously request a vote and fail (as those nodes are down).
  3. The peers.json method helps you define specific agents you want to be part of the raft peer set. In your case, you will have only one left node.

In case you plan to bring the rest of the agents back into the cluster, ensure you clean the data directory on them and start fresh. They will join the cluster as followers, replicate the leader’s data, and continue to function.

I hope this helps.

Hi @Ranjandas I am not sure, but I was available to replicate by blocking 2 nodes out of 5 including the current lead.

To test how to recover I stopped all nodes and created the peer.json, thanks for the link.

After checking that I have a new leader and data is in place I delete the data dir for nodes that are still stuck and they start clean.

when using telemetry:

telemetry {
  prometheus_retention_time = "72h"
  disable_hostname = true
}

Any metrics to follow apart consul.autopilot.healthy ?

@nbari It could be related to the following issue: Unstable leadership when running server is demoted/removed without its participation · Issue #524 · hashicorp/raft · GitHub which I believe was the underlying cause of similar behaviour I’ve seen on a 3-server cluster.

It looks like today’s nomad releases include a fix for that, not seeing in the consul release log yet, but presumably soon.

A workaround in the mean time for this issue is to disable cleanup of dead servers in autopilot.

2 Likes