I’m running into a problem that seems to be occurring independently with both Consul (1.13.1 tested so far) as well as Nomad (1.3.5 tested so far).
I have a 3-server cluster, where if either of the non-leader nodes suffers a temporary network outage, the cluster continues to be ok during the outage, as well as afterward – if the outage was fairly short (eg seconds). However, if the outage lasts slightly longer (eg > 30 seconds), then as soon as connectivity resumes, the cluster immediately goes into a cycle where the leader continuously loses its leadership, a new leader is elected (usually the one that was the leader before), then loses its leadership again, and so on. This seems to continue for an arbitrary amount of time (I’ve not measured this in depth with Consul but at least for multiple minutes and with Nomad on one occasion this cycle continued for over an hour before it stabilized).
This is a bare metal cluster with 3 server nodes (each in a different DC in the same city) each acting as both a Consul and Nomad server node. Consul is being used only for KV (referenced in Nomad task templates), service discovery likewise within Nomad templates (ie consul-template) and health-checking of Nomad tasks. No DNS or connect. Consul is being run via systemd and Nomad is configured with knowledge of its own server IPs so that its not reliant on Consul to discover itself. (Both Nomad and Consul provisioned via the community ansible roles).
All nodes communicate via a nebula overlay network (also separately managed via systemd), so the config is fairly simple with no TLS either.
Basically, it looks like I can induce this behaviour simply by downing the nebula service on either one of the non-leader servers, waiting 30 seconds or so, and then upping the nebula service again.
The below log is an extract from the leader (landra on 192.168.100.50) from the point where connectivity on non-leader (solaria on 192.168.100.30) is restored.
2022-10-12T20:03:57.068-0400 [WARN] agent.server.raft: rejecting vote request since node is not a voter: from=192.168.100.30:8300
2022-10-12T20:03:57.582-0400 [INFO] agent.server.serf.lan: serf: EventMemberJoin: solaria 192.168.100.30
2022-10-12T20:03:57.583-0400 [INFO] agent.server: Adding LAN server: server="solaria (Addr: tcp/192.168.100.30:8300) (DC: mtl)"
2022-10-12T20:03:57.583-0400 [INFO] agent.server.raft: updating configuration: command=AddNonvoter server-id=c822f3cc-d9fa-6f5d-3053-b9f592c31c81 server-addr=192.168.100.30:8300 servers="[{Suffrage:Voter ID:b77
a3f6a-0636-7d24-9e4b-c80860619b20 Address:192.168.100.70:8300} {Suffrage:Voter ID:9270add6-1a06-578b-27b5-69c05bf1b2c9 Address:192.168.100.50:8300} {Suffrage:Nonvoter ID:c822f3cc-d9fa-6f5d-3053-b9f592c31c81 Addr
ess:192.168.100.30:8300}]"
2022-10-12T20:03:57.584-0400 [INFO] agent.server.raft: added peer, starting replication: peer=c822f3cc-d9fa-6f5d-3053-b9f592c31c81
2022-10-12T20:03:57.587-0400 [ERROR] agent.server.raft: peer has newer term, stopping replication: peer="{Nonvoter c822f3cc-d9fa-6f5d-3053-b9f592c31c81 192.168.100.30:8300}"
2022-10-12T20:03:57.587-0400 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.100.50:8300 [Follower]" leader-address= leader-id=
2022-10-12T20:03:57.587-0400 [ERROR] agent.server.autopilot: failed to add raft non-voting peer: id=c822f3cc-d9fa-6f5d-3053-b9f592c31c81 address=192.168.100.30:8300 error="leadership lost while committing log"
2022-10-12T20:03:57.587-0400 [ERROR] agent.server: failed to reconcile member: member="{solaria 192.168.100.30 8301 map[acls:0 build:1.13.1:c6d0f9ec dc:mtl expect:3 ft_fs:1 ft_si:1 id:c822f3cc-d9fa-6f5d-3053-b9f
592c31c81 port:8300 raft_vsn:3 role:consul segment: vsn:2 vsn_max:3 vsn_min:2 wan_join_port:8302] alive 1 5 2 2 5 4}" partition=default error="leadership lost while committing log"
2022-10-12T20:03:57.587-0400 [INFO] agent.leader: stopped routine: routine="federation state pruning"
2022-10-12T20:03:57.587-0400 [INFO] agent.server.raft: aborting pipeline replication: peer="{Voter b77a3f6a-0636-7d24-9e4b-c80860619b20 192.168.100.70:8300}"
2022-10-12T20:03:57.587-0400 [INFO] agent.server.autopilot: reconciliation now disabled
2022-10-12T20:03:57.587-0400 [INFO] agent.server: cluster leadership lost
2022-10-12T20:03:57.782-0400 [WARN] agent.server.raft: rejecting vote request since node is not a voter: from=192.168.100.30:8300
2022-10-12T20:03:58.844-0400 [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2022-10-12T20:03:58.844-0400 [INFO] agent.server.raft: entering candidate state: node="Node at 192.168.100.50:8300 [Candidate]" term=4305
2022-10-12T20:03:58.852-0400 [INFO] agent.server.raft: election won: tally=2
2022-10-12T20:03:58.852-0400 [INFO] agent.server.raft: entering leader state: leader="Node at 192.168.100.50:8300 [Leader]"
2022-10-12T20:03:58.852-0400 [INFO] agent.server.raft: added peer, starting replication: peer=b77a3f6a-0636-7d24-9e4b-c80860619b20
2022-10-12T20:03:58.852-0400 [INFO] agent.server.raft: added peer, starting replication: peer=c822f3cc-d9fa-6f5d-3053-b9f592c31c81
2022-10-12T20:03:58.853-0400 [INFO] agent.server: cluster leadership acquired
2022-10-12T20:03:58.853-0400 [INFO] agent.server: New leader elected: payload=landra
2022-10-12T20:03:58.854-0400 [ERROR] agent.server.raft: peer has newer term, stopping replication: peer="{Nonvoter c822f3cc-d9fa-6f5d-3053-b9f592c31c81 192.168.100.30:8300}"
2022-10-12T20:03:58.854-0400 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.100.50:8300 [Follower]" leader-address= leader-id=
2022-10-12T20:03:58.854-0400 [ERROR] agent.server: failed to wait for barrier: error="leadership lost while committing log"
2022-10-12T20:03:58.854-0400 [INFO] agent.server: cluster leadership lost
2022-10-12T20:03:58.914-0400 [WARN] agent.server.raft: rejecting vote request since node is not a voter: from=192.168.100.30:8300
2022-10-12T20:03:59.079-0400 [WARN] agent.server.coordinate: Batch update failed: error="node is not the leader"
2022-10-12T20:04:00.269-0400 [WARN] agent.server.raft: rejecting vote request since our last term is greater: candidate=192.168.100.70:8300 last-term=4305 last-candidate-term=4304
2022-10-12T20:04:00.484-0400 [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2022-10-12T20:04:00.484-0400 [INFO] agent.server.raft: entering candidate state: node="Node at 192.168.100.50:8300 [Candidate]" term=4307
2022-10-12T20:04:00.491-0400 [INFO] agent.server.raft: election won: tally=2
2022-10-12T20:04:00.491-0400 [INFO] agent.server.raft: entering leader state: leader="Node at 192.168.100.50:8300 [Leader]"
2022-10-12T20:04:00.491-0400 [INFO] agent.server.raft: added peer, starting replication: peer=b77a3f6a-0636-7d24-9e4b-c80860619b20
2022-10-12T20:04:00.491-0400 [INFO] agent.server.raft: added peer, starting replication: peer=c822f3cc-d9fa-6f5d-3053-b9f592c31c81
2022-10-12T20:04:00.491-0400 [INFO] agent.server: New leader elected: payload=landra
2022-10-12T20:04:00.491-0400 [INFO] agent.server: cluster leadership acquired
2022-10-12T20:04:00.493-0400 [ERROR] agent.server.raft: peer has newer term, stopping replication: peer="{Nonvoter c822f3cc-d9fa-6f5d-3053-b9f592c31c81 192.168.100.30:8300}"
2022-10-12T20:04:00.493-0400 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.100.50:8300 [Follower]" leader-address= leader-id=
2022-10-12T20:04:00.493-0400 [ERROR] agent.server: failed to wait for barrier: error="leadership lost while committing log"
2022-10-12T20:04:00.493-0400 [INFO] agent.server: cluster leadership lost
2022-10-12T20:04:00.493-0400 [WARN] agent.server.raft: appendEntries rejected, sending older logs: peer="{Voter b77a3f6a-0636-7d24-9e4b-c80860619b20 192.168.100.70:8300}" next=7499660
2022-10-12T20:04:00.661-0400 [WARN] agent.server.raft: rejecting vote request since node is not a voter: from=192.168.100.30:8300
2022-10-12T20:04:01.588-0400 [WARN] agent.server.raft: rejecting vote request since our last term is greater: candidate=192.168.100.70:8300 last-term=4307 last-candidate-term=4304
2022-10-12T20:04:01.726-0400 [WARN] agent.server.raft: rejecting vote request since node is not a voter: from=192.168.100.30:8300
2022-10-12T20:04:02.190-0400 [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2022-10-12T20:04:02.190-0400 [INFO] agent.server.raft: entering candidate state: node="Node at 192.168.100.50:8300 [Candidate]" term=4309
2022-10-12T20:04:02.199-0400 [INFO] agent.server.raft: election won: tally=2
2022-10-12T20:04:02.199-0400 [INFO] agent.server.raft: entering leader state: leader="Node at 192.168.100.50:8300 [Leader]"
2022-10-12T20:04:02.199-0400 [INFO] agent.server.raft: added peer, starting replication: peer=b77a3f6a-0636-7d24-9e4b-c80860619b20
2022-10-12T20:04:02.199-0400 [INFO] agent.server.raft: added peer, starting replication: peer=c822f3cc-d9fa-6f5d-3053-b9f592c31c81
2022-10-12T20:04:02.199-0400 [INFO] agent.server: cluster leadership acquired
2022-10-12T20:04:02.199-0400 [INFO] agent.server: New leader elected: payload=landra
2022-10-12T20:04:02.201-0400 [ERROR] agent.server.raft: peer has newer term, stopping replication: peer="{Nonvoter c822f3cc-d9fa-6f5d-3053-b9f592c31c81 192.168.100.30:8300}"
2022-10-12T20:04:02.201-0400 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.100.50:8300 [Follower]" leader-address= leader-id=
2022-10-12T20:04:02.201-0400 [ERROR] agent.server: failed to wait for barrier: error="leadership lost while committing log"
2022-10-12T20:04:02.201-0400 [INFO] agent.server: cluster leadership lost
Example config:
{
"addresses": {
"dns": "192.168.100.50",
"grpc": "192.168.100.50",
"http": "192.168.100.50",
"https": "192.168.100.50"
},
"advertise_addr": "192.168.100.50",
"advertise_addr_wan": "192.168.100.50",
"bind_addr": "192.168.100.50",
"bootstrap": false,
"bootstrap_expect": 3,
"client_addr": "192.168.100.50",
"data_dir": "/opt/consul",
"datacenter": "mtl",
"disable_update_check": false,
"domain": "consul",
"enable_local_script_checks": false,
"enable_script_checks": false,
"encrypt": "...",
"encrypt_verify_incoming": true,
"encrypt_verify_outgoing": true,
"log_file": "/var/log/consul/consul.log",
"log_level": "INFO",
"log_rotate_bytes": 0,
"log_rotate_duration": "24h",
"log_rotate_max_files": 0,
"performance": {
"leave_drain_time": "5s",
"raft_multiplier": 1,
"rpc_hold_timeout": "7s"
},
"ports": {
"dns": 8600,
"grpc": -1,
"http": 8500,
"https": -1,
"serf_lan": 8301,
"serf_wan": 8302,
"server": 8300
},
"raft_protocol": 3,
"retry_interval": "30s",
"retry_interval_wan": "30s",
"retry_join": [
"192.168.100.30",
"192.168.100.50",
"192.168.100.70"
],
"retry_max": 0,
"retry_max_wan": 0,
"server": true,
"translate_wan_addrs": false,
"ui_config": {
"enabled": true
}
}
I’ve not tested bigger raft_multiplier with Consul, but have with Nomad and the same leadership instability cycle still results, just occurs in slower motion.
What am I missing? Any insight gratefully appreciated. Logs etc from the Nomad side can be found in my post in the Nomad category from about 2 weeks ago and since I’ve made no headway in figuring this out.