Recently I’ve updated consul from 1.7.2 to 1.9.0 and it started behave differently. According to documentation about telemetry and leadership changes (Telemetry | Consul by HashiCorp), in healthy cluster I should have these values for parameters set:
consul.raft.leader.lastContactlower than 200ms
consul.raft.state.candidateshould be 0 for each node?
consul.raft.state.leadershould be 1 only for leader?
Am I correct with above assumptions?
Consul behave differently, these are the issues I have, please let me know if it’s correct behaviour and what should be done to fix it:
consul.raft.state.leader. I have 2 data centers, one with 3 server nodes, second one with 5 server nodes. Grafana shows me 1 for this metric for at least 2 server nodes in each data center. In one data center it’s equal to 3 for one node and 1 for other.
consul.raft.state.candidatewas higher than 0 only for leadership election and then it came back to 0. Right now it’s constantly higher than 0. Each server node has value at least set to 1, but sometimes it’s higher than 200.
- consul.autopilot.health - for the last 7 days leader has been elected 3 times. Not bad, but it was working correctly before.
Is it something related with 1.9.0 upgrade? Maybe there is some configuration I’m missing? Please let me know what information do you need from me to be able to help with that.