Hello,
I can’t init a cluster. I tried 5, 3, 2, and 1 node. Consul successfully works using only one node with the help of command bootstrap_expect 1
. Config file (same on all nodes):
datacenter = "dc1"
data_dir = "/opt/consul"
client_addr = "0.0.0.0"
log_level = "INFO"
enable_syslog = true
node_name = "%{node_name}%"
server = true
bootstrap_expect = 3
bind_addr = "0.0.0.0"
advertise_addr = "10.xxx.xxx.xxx"
start_join = [
"10.xxx.xxx.xxx",
"10.xxx.xxx.xxx",
"10.xxx.xxx.xxx"
]
retry_join = [
"10.xxx.xxx.xxx:8301",
"10.xxx.xxx.xxx:8301",
"10.xxx.xxx.xxx:8301"
]
rejoin_after_leave = true
Logs:
node0:
Jun 29 20:21:17 srv1-prod consul[44883]: agent: Started DNS server: address=0.0.0.0:8600 network=udp
Jun 29 20:21:17 srv1-prod consul[44883]: agent: Started DNS server: address=0.0.0.0:8600 network=tcp
Jun 29 20:21:17 srv1-prod consul[44883]: agent: Starting server: address=[::]:8500 network=tcp protocol=http
Jun 29 20:21:17 srv1-prod consul[44883]: agent: Joining cluster
Jun 29 20:21:17 srv1-prod consul[44883]: agent: (LAN) joining: lan_addresses=[10.0.1.4, 10.0.1.5, 10.0.1.9]
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: 10.0.1.4:8301
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server.memberlist.lan: memberlist: Stream connection from=10.0.1.4:48710
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server: Existing Raft peers reported by server, disabling bootstrap mode: server=srv2-prod
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server: Adding LAN server: server="srv2-prod (Addr: tcp/10.0.1.5:8300) (DC: dc1)"
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: 10.0.1.5:8301
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.440+0300 [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: 10.0.1.9:8>
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: 10.0.1.9:8301
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.444+0300 [INFO] agent: (LAN) joined: number_of_nodes=3
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.444+0300 [DEBUG] agent: systemd notify failed: error="No socket"
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.444+0300 [INFO] agent: Join completed. Initial agents synced with: agent_count=3
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.445+0300 [INFO] agent: started state syncer
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.445+0300 [INFO] agent: Consul agent running!
Jun 29 20:21:17 srv1-prod consul[44883]: agent: (LAN) joined: number_of_nodes=3
Jun 29 20:21:17 srv1-prod consul[44883]: agent: systemd notify failed: error="No socket"
Jun 29 20:21:17 srv1-prod consul[44883]: agent: Join completed. Initial agents synced with: agent_count=3
Jun 29 20:21:17 srv1-prod consul[44883]: agent: started state syncer
Jun 29 20:21:17 srv1-prod consul[44883]: agent: Consul agent running!
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.713+0300 [DEBUG] agent.server.serf.lan: serf: messageJoinType: srv1-prod
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server.serf.lan: serf: messageJoinType: srv1-prod
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.785+0300 [DEBUG] agent.server.serf.lan: serf: messageJoinType: srv1-prod
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server.serf.lan: serf: messageJoinType: srv1-prod
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.812+0300 [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=10.0.1.9:53510
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server.memberlist.lan: memberlist: Stream connection from=10.0.1.9:53510
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server.serf.lan: serf: messageJoinType: srv1-prod
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.812+0300 [DEBUG] agent.server.serf.lan: serf: messageJoinType: srv1-prod
Jun 29 20:21:17 srv1-prod consul[44883]: 2022-06-29T20:21:17.911+0300 [DEBUG] agent.server.serf.lan: serf: messageJoinType: srv1-prod
Jun 29 20:21:17 srv1-prod consul[44883]: agent.server.serf.lan: serf: messageJoinType: srv1-prod
Jun 29 20:21:23 srv1-prod consul[44883]: 2022-06-29T20:21:23.426+0300 [WARN] agent: Check missed TTL, is now critical: check=redis@10.0.1.4:6379:replication-stat>
Jun 29 20:21:23 srv1-prod consul[44883]: agent: Check missed TTL, is now critical: check=redis@10.0.1.4:6379:replication-status-check
Jun 29 20:21:24 srv1-prod consul[44883]: 2022-06-29T20:21:24.500+0300 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
Jun 29 20:21:24 srv1-prod consul[44883]: agent.anti_entropy: failed to sync remote state: error="No cluster leader"
Jun 29 20:21:24 srv1-prod consul[44883]: 2022-06-29T20:21:24.765+0300 [DEBUG] agent.server.memberlist.wan: memberlist: Stream connection from=10.0.1.5:47670
Jun 29 20:21:24 srv1-prod consul[44883]: agent.server.memberlist.wan: memberlist: Stream connection from=10.0.1.5:47670
node1:
Jun 29 20:18:13 srv2-prod consul[17328]: agent: (LAN) joined: number_of_nodes=3
Jun 29 20:18:13 srv2-prod consul[17328]: agent: Join completed. Initial agents synced with: agent_count=3
Jun 29 20:18:13 srv2-prod consul[17328]: agent: started state syncer
Jun 29 20:18:13 srv2-prod consul[17328]: agent: Consul agent running!
Jun 29 20:18:19 srv2-prod consul[17328]: 2022-06-29T20:18:19.588+0300 [WARN] agent: Check missed TTL, is now critical: check=redis@10.0.1.5:6379:replication-stat>
Jun 29 20:18:19 srv2-prod consul[17328]: 2022-06-29T20:18:19.588+0300 [WARN] agent: Check missed TTL, is now critical: check="Resec: slave replication status"
Jun 29 20:18:19 srv2-prod consul[17328]: agent: Check missed TTL, is now critical: check=redis@10.0.1.5:6379:replication-status-check
Jun 29 20:18:19 srv2-prod consul[17328]: agent: Check missed TTL, is now critical: check="Resec: slave replication status"
Jun 29 20:18:20 srv2-prod consul[17328]: 2022-06-29T20:18:20.681+0300 [ERROR] agent.rpcclient.health: subscribe call failed: err="rpc error: code = Unknown desc =>
Jun 29 20:18:20 srv2-prod consul[17328]: 2022-06-29T20:18:20.681+0300 [ERROR] agent.http: Request error: method=GET url=/v1/health/service/redis?index=1&passing=1>
Jun 29 20:18:20 srv2-prod consul[17328]: agent.rpcclient.health: subscribe call failed: err="rpc error: code = Unknown desc = No cluster leader" topic=ServiceHeal>
Jun 29 20:18:20 srv2-prod consul[17328]: agent.http: Request error: method=GET url=/v1/health/service/redis?index=1&passing=1&tag=master&wait=30000ms from=172.17.>
Jun 29 20:18:20 srv2-prod consul[17328]: 2022-06-29T20:18:20.721+0300 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
Jun 29 20:18:20 srv2-prod consul[17328]: agent.anti_entropy: failed to sync remote state: error="No cluster leader"
node2:
Jun 29 17:20:33 clusterNode3 consul[2238]: agent: (LAN) joined: number_of_nodes=3
Jun 29 17:20:33 clusterNode3 consul[2238]: agent: Join completed. Initial agents synced with: agent_count=3
Jun 29 17:20:33 clusterNode3 consul[2238]: agent: started state syncer
Jun 29 17:20:33 clusterNode3 consul[2238]: agent: Consul agent running!
Jun 29 17:20:36 clusterNode3 consul[2238]: 2022-06-29T17:20:36.030Z [WARN] agent: Check socket connection failed: check=_nomad-check-bb4d3ebd68c5282a64fbbe7e52b9b
9ca4cc8dd8c error="dial tcp 0.0.0.0:4648: connect: connection refused"
Jun 29 17:20:36 clusterNode3 consul[2238]: 2022-06-29T17:20:36.030Z [WARN] agent: Check is now critical: check=_nomad-check-bb4d3ebd68c5282a64fbbe7e52b9b9ca4cc8dd
8c
Jun 29 17:20:36 clusterNode3 consul[2238]: agent: Check socket connection failed: check=_nomad-check-bb4d3ebd68c5282a64fbbe7e52b9b9ca4cc8dd
8c error="dial tcp 0.0.0.0:4648: connect: connection refused"
Jun 29 17:20:36 clusterNode3 consul[2238]: agent: Check is now critical: check=_nomad-check-bb4d3ebd68c5282a64fbbe7e52b9b9ca4cc8dd8c
Jun 29 17:20:38 clusterNode3 consul[2238]: 2022-06-29T17:20:38.533Z [WARN] agent: Check socket connection failed: check=_nomad-check-eed2ace5fdfef736b944c575e1d06
c3b09aba34e error="dial tcp 0.0.0.0:4647: connect: connection refused"
Jun 29 17:20:38 clusterNode3 consul[2238]: 2022-06-29T17:20:38.533Z [WARN] agent: Check is now critical: check=_nomad-check-eed2ace5fdfef736b944c575e1d06c3b09aba3
4e
Jun 29 17:20:38 clusterNode3 consul[2238]: agent: Check socket connection failed: check=_nomad-check-eed2ace5fdfef736b944c575e1d06c3b09aba3
4e error="dial tcp 0.0.0.0:4647: connect: connection refused"
Jun 29 17:20:38 clusterNode3 consul[2238]: agent: Check is now critical: check=_nomad-check-eed2ace5fdfef736b944c575e1d06c3b09aba34e
Jun 29 17:20:40 clusterNode3 consul[2238]: 2022-06-29T17:20:40.505Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
Jun 29 17:20:40 clusterNode3 consul[2238]: agent.anti_entropy: failed to sync remote state: error="No cluster leader"
I have tried:
- Increase/decrease count of nodes
- Enable/disable bootstrap
- Create peers.json on each node
force-leave
- Manual join using CLI command
What else can I try?