Dear Consul community,
I am learning Consul and I have 8 nodes under the same network subnet, I would like to configure 3 independent consul clusters under the same network, each cluster would have 3 nodes running as both servers.
I tried this configuration:
cluster 1
# cat /etc/nomad.d/server.hcl
server {
enabled = true
bootstrap_expect = 3
server_join {
retry_join = ["nid001308", "nid001309", "nid001310"]
}
}
and
# cat /etc/nomad.d/nomad.hcl
log_level = "DEBUG"
datacenter = "test1"
data_dir = "/opt/nomad"
tls {
http = true
rpc = true
ca_file = "/root/nomad-agent-ca.pem"
cert_file = "/root/global-server-nomad.pem"
key_file = "/root/global-server-nomad-key.pem"
verify_server_hostname = true
verify_https_client = true
}
acl {
enabled = true
}
cluster 2
# cat /etc/nomad.d/server.hcl
server {
enabled = true
bootstrap_expect = 3
server_join {
retry_join = ["nid002588", "nid002590", "nid002832"]
}
}
and
# cat /etc/nomad.d/nomad.hcl
log_level = "DEBUG"
datacenter = "test1"
data_dir = "/opt/nomad"
tls {
http = true
rpc = true
ca_file = "/root/nomad-agent-ca.pem"
cert_file = "/root/global-server-nomad.pem"
key_file = "/root/global-server-nomad-key.pem"
verify_server_hostname = true
verify_https_client = true
}
acl {
enabled = true
}
I am reusing the same TLS certs and telling the agent which nodes are servers.
This is not working for me, meaning, the first cluster runs fine, but the second fails because it can’t find a leader
Below the logs of one of the nomad servers related to the second cluster affected by this problem
Jul 26 11:46:37 nid002588 nomad[35853]: ==> Nomad agent configuration:
Jul 26 11:46:37 nid002588 nomad[35853]: Advertise Addrs: HTTP: 172.17.0.1:4646; RPC: 172.17.0.1:4647; Serf: 172.17.0.1:4648
Jul 26 11:46:37 nid002588 nomad[35853]: Bind Addrs: HTTP: [0.0.0.0:4646]; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
Jul 26 11:46:37 nid002588 nomad[35853]: Client: false
Jul 26 11:46:37 nid002588 nomad[35853]: Log Level: DEBUG
Jul 26 11:46:37 nid002588 nomad[35853]: Region: global (DC: psitds)
Jul 26 11:46:37 nid002588 nomad[35853]: Server: true
Jul 26 11:46:37 nid002588 nomad[35853]: Version: 1.5.6
Jul 26 11:46:37 nid002588 nomad[35853]: ==> Nomad agent started! Log data will stream in below:
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.529+0200 [INFO] nomad: setting up raft bolt store: no_freelist_sync=false
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.529+0200 [INFO] nomad.raft: initial configuration: index=0 servers=[]
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.529+0200 [INFO] nomad.raft: entering follower state: follower="Node at 172.17.0.1:4647 [Follower]" leader-address= leader-id=
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.529+0200 [INFO] nomad: serf: EventMemberJoin: nid002588.global 172.17.0.1
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.529+0200 [INFO] nomad: starting scheduling worker(s): num_workers=128 schedulers=["service", "batch", "system", "sysbatch", "_core"]
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.529+0200 [DEBUG] nomad: started scheduling worker: id=c262719b-a3de-f607-cff5-9b0fbce5197c index=1 of=128
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.529+0200 [DEBUG] nomad: started scheduling worker: id=5ebc69f3-c238-663b-abd4-acee3e267302 index=2 of=128
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.529+0200 [DEBUG] worker: running: worker_id=c262719b-a3de-f607-cff5-9b0fbce5197c
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.529+0200 [DEBUG] worker: running: worker_id=79cd4b6c-cf06-cce7-7083-eee5d0e17546
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.529+0200 [DEBUG] nomad: started scheduling worker: id=79cd4b6c-cf06-cce7-7083-eee5d0e17546 index=3 of=128
...
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.532+0200 [DEBUG] worker: running: worker_id=e3cd3b1a-b6d0-b6ed-563b-627fd6f19901
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.532+0200 [DEBUG] worker: running: worker_id=9fe4456b-80e0-d61e-96be-f4dc282cc263
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.532+0200 [DEBUG] worker: running: worker_id=c3082011-4742-5173-2634-e27e59b55023
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.532+0200 [INFO] agent.joiner: starting retry join: servers="nid002588 nid002590 nid002832"
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.532+0200 [DEBUG] nomad: lost contact with Nomad quorum, falling back to Consul for server list
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.532+0200 [INFO] nomad: adding server: server="nid002588.global (Addr: 172.17.0.1:4647) (DC: psitds)"
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.532+0200 [DEBUG] nomad.keyring.replicator: starting encryption key replication
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.533+0200 [ERROR] nomad: error looking up Nomad servers in Consul: error="server.nomad: unable to query Consul datacenters: Get \"http://127.0.0.1:8500/v1/catalog/datacenters\": dial tcp 127.0.0.1:8500: connect: connection refused"
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.534+0200 [DEBUG] nomad: memberlist: Initiating push/pull sync with: 148.187.115.35:4648
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.534+0200 [DEBUG] nomad: memberlist: Stream connection from=148.187.115.35:40920
...
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.540+0200 [DEBUG] nomad: memberlist: Initiating push/pull sync with: 148.187.115.12:4648
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.540+0200 [DEBUG] nomad: memberlist: Initiating push/pull sync with: 148.187.115.23:4648
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.541+0200 [DEBUG] nomad: memberlist: Initiating push/pull sync with: 148.187.115.24:4648
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.542+0200 [INFO] nomad: found expected number of peers, attempting to bootstrap cluster...: peers="172.17.0.1:4647,172.17.0.1:4647,172.17.0.1:4647"
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.542+0200 [ERROR] nomad: failed to bootstrap cluster: error="found duplicate address in configuration: 172.17.0.1:4647"
...
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.543+0200 [DEBUG] nomad: memberlist: Initiating push/pull sync with: 148.187.114.213:4648
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.543+0200 [DEBUG] nomad: memberlist: Initiating push/pull sync with: 148.187.114.214:4648
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.544+0200 [DEBUG] nomad: memberlist: Initiating push/pull sync with: 148.187.114.225:4648
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.544+0200 [DEBUG] nomad: memberlist: Initiating push/pull sync with: 148.187.114.226:4648
Jul 26 11:46:37 nid002588 nomad[35853]: 2023-07-26T11:46:37.545+0200 [INFO] agent.joiner: retry join completed: initial_servers=12 agent_mode=server
Jul 26 11:46:38 nid002588 nomad[35853]: 2023-07-26T11:46:38.030+0200 [DEBUG] nomad: serf: messageJoinType: nid002588.global
Jul 26 11:46:38 nid002588 nomad[35853]: 2023-07-26T11:46:38.030+0200 [DEBUG] nomad: serf: messageJoinType: nid002588.global
Jul 26 11:46:38 nid002588 nomad[35853]: 2023-07-26T11:46:38.530+0200 [DEBUG] nomad: serf: messageJoinType: nid002588.global
Jul 26 11:46:38 nid002588 nomad[35853]: 2023-07-26T11:46:38.530+0200 [DEBUG] nomad: serf: messageJoinType: nid002588.global
Jul 26 11:46:38 nid002588 nomad[35853]: 2023-07-26T11:46:38.921+0200 [WARN] nomad.raft: no known peers, aborting election
Jul 26 11:46:42 nid002588 nomad[35853]: 2023-07-26T11:46:42.530+0200 [WARN] nomad: memberlist: Got ping for unexpected node 'nid002832.global' from=10.100.24.11:4648
...
Jul 26 11:46:45 nid002588 nomad[35853]: 2023-07-26T11:46:45.531+0200 [DEBUG] nomad: memberlist: Failed UDP ping: nid002832.global (timeout reached)
Jul 26 11:46:45 nid002588 nomad[35853]: 2023-07-26T11:46:45.531+0200 [WARN] nomad: memberlist: Got ping for unexpected node 'nid002832.global' from=10.100.24.11:4648
Jul 26 11:46:45 nid002588 nomad[35853]: 2023-07-26T11:46:45.531+0200 [DEBUG] nomad: memberlist: Stream connection from=10.100.24.11:56900
Jul 26 11:46:45 nid002588 nomad[35853]: 2023-07-26T11:46:45.531+0200 [WARN] nomad: memberlist: Got ping for unexpected node nid002832.global from=10.100.24.11:56900
Jul 26 11:46:45 nid002588 nomad[35853]: 2023-07-26T11:46:45.532+0200 [ERROR] nomad: memberlist: Failed fallback TCP ping: EOF
Jul 26 11:46:46 nid002588 nomad[35853]: 2023-07-26T11:46:46.533+0200 [DEBUG] nomad: lost contact with Nomad quorum, falling back to Consul for server list
Jul 26 11:46:47 nid002588 nomad[35853]: 2023-07-26T11:46:47.531+0200 [INFO] nomad: memberlist: Suspect nid002832.global has failed, no acks received
Jul 26 11:46:47 nid002588 nomad[35853]: 2023-07-26T11:46:47.531+0200 [WARN] nomad: memberlist: Got ping for unexpected node 'nid002590.global' from=10.100.24.11:4648
Any hint?