Understanding raft parameters

frumbar · June 16, 2021, 8:14am

Hello,
Recently I’ve updated consul from 1.7.2 to 1.9.0 and it started behave differently. According to documentation about telemetry and leadership changes (Telemetry | Consul by HashiCorp), in healthy cluster I should have these values for parameters set:

consul.raft.leader.lastContact lower than 200ms
consul.raft.state.candidate should be 0 for each node?
consul.raft.state.leader should be 1 only for leader?

Am I correct with above assumptions?

Consul behave differently, these are the issues I have, please let me know if it’s correct behaviour and what should be done to fix it:

consul.raft.state.leader. I have 2 data centers, one with 3 server nodes, second one with 5 server nodes. Grafana shows me 1 for this metric for at least 2 server nodes in each data center. In one data center it’s equal to 3 for one node and 1 for other.
consul.raft.state.candidate was higher than 0 only for leadership election and then it came back to 0. Right now it’s constantly higher than 0. Each server node has value at least set to 1, but sometimes it’s higher than 200.
consul.autopilot.health - for the last 7 days leader has been elected 3 times. Not bad, but it was working correctly before.

Is it something related with 1.9.0 upgrade? Maybe there is some configuration I’m missing? Please let me know what information do you need from me to be able to help with that.

Topic		Replies	Views
Cluster leadership instability Consul	0	531	January 14, 2022
Losing heartbeat and re-election leader Consul	0	355	April 24, 2023
Consul failing to commit leader election results Consul	9	1831	November 22, 2022
Unable to make fault-tolerant 5 node Consul server setup Consul k8s , raft , consul	5	431	November 14, 2022
Consul, the best way to know who is the cluster leader currently Consul	3	1092	September 1, 2021

Understanding raft parameters

Related topics