Nomad servers can't create a quorum

Hi, I am trying to deploy 3 nomad servers on vms. For some reason the servers can’t make a quorum and I can’t find the reason

Jan 20 01:10:47 santis-nomad-srv01 systemd[1]: Started nomad.service.
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]: ==> Loaded configuration from /etc/nomad.d/nomad.hcl, /etc/nomad.d/server.hcl
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]: ==> Starting Nomad agent...
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]: ==> Nomad agent configuration:
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:        Advertise Addrs: HTTP: santis-nomad-srv01:4646; RPC: santis-nomad-srv01:4647; Serf: santis-nomad-srv01:4648
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:             Bind Addrs: HTTP: [0.0.0.0:4646]; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:                 Client: false
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:              Log Level: DEBUG
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:                Node Id: 01f0e542-0707-f9db-6e9d-176351a0dff9
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:                 Region: global (DC: santis)
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:                 Server: true
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:                Version: 1.6.1
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]: ==> Nomad agent started! Log data will stream in below:
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.536+0100 [INFO]  nomad: setting up raft bolt store: no_freelist_sync=false
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [INFO]  nomad.raft: initial configuration: index=0 servers=[]
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [INFO]  nomad.raft: entering follower state: follower="Node at 148.187.3.56:4647 [Follower]" leader-address= leader-id=
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [WARN]  nomad: memberlist: Binding to public address without encryption!
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [INFO]  nomad: serf: EventMemberJoin: santis-nomad-srv01.global 148.187.3.56
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [INFO]  nomad: starting scheduling worker(s): num_workers=2 schedulers=["service", "batch", "system", "sysbatch", "_core"]
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [DEBUG] nomad: started scheduling worker: id=4b1cb24d-8870-4fa6-ba81-3f38b04f1041 index=1 of=2
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [DEBUG] nomad: started scheduling worker: id=5d9873a9-4bfa-080f-12cd-8176032fb58e index=2 of=2
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [INFO]  nomad: started scheduling worker(s): num_workers=2 schedulers=["service", "batch", "system", "sysbatch", "_core"]
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [DEBUG] worker: running: worker_id=4b1cb24d-8870-4fa6-ba81-3f38b04f1041
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [WARN]  agent: not registering Nomad HTTPS Health Check because verify_https_client enabled
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [DEBUG] worker: running: worker_id=5d9873a9-4bfa-080f-12cd-8176032fb58e
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [DEBUG] nomad: lost contact with Nomad quorum, falling back to Consul for server list
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [INFO]  nomad: adding server: server="santis-nomad-srv01.global (Addr: 148.187.3.56:4647) (DC: santis)"
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [ERROR] nomad: error looking up Nomad servers in Consul: error="server.nomad: unable to query Consul datacenters: Get \"http://127.0.0.1:8500/v1/catalog/datacenters\": dial tcp 127.0.0.1:8500: connect:>
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [DEBUG] http: UI is enabled
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.537+0100 [DEBUG] nomad.keyring.replicator: starting encryption key replication
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.539+0100 [INFO]  agent.joiner: starting retry join: servers="santis-nomad-srv01 santis-nomad-srv02 santis-nomad-srv03"
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.539+0100 [DEBUG] nomad: memberlist: Initiating push/pull sync with:  148.187.3.56:4648
Jan 20 01:10:47 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:47.539+0100 [DEBUG] nomad: memberlist: Stream connection from=148.187.3.56:52164
Jan 20 01:10:49 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:49.432+0100 [WARN]  nomad.raft: no known peers, aborting election
Jan 20 01:10:57 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:57.647+0100 [ERROR] worker: failed to dequeue evaluation: worker_id=5d9873a9-4bfa-080f-12cd-8176032fb58e error="No cluster leader"
Jan 20 01:10:57 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:10:57.879+0100 [ERROR] worker: failed to dequeue evaluation: worker_id=4b1cb24d-8870-4fa6-ba81-3f38b04f1041 error="No cluster leader"
Jan 20 01:11:02 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:11:02.797+0100 [ERROR] worker: failed to dequeue evaluation: worker_id=5d9873a9-4bfa-080f-12cd-8176032fb58e error="No cluster leader"
Jan 20 01:11:03 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:11:03.069+0100 [ERROR] worker: failed to dequeue evaluation: worker_id=4b1cb24d-8870-4fa6-ba81-3f38b04f1041 error="No cluster leader"
Jan 20 01:11:08 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:11:08.206+0100 [ERROR] worker: failed to dequeue evaluation: worker_id=5d9873a9-4bfa-080f-12cd-8176032fb58e error="No cluster leader"
Jan 20 01:11:08 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:11:08.596+0100 [ERROR] worker: failed to dequeue evaluation: worker_id=4b1cb24d-8870-4fa6-ba81-3f38b04f1041 error="No cluster leader"
Jan 20 01:11:11 santis-nomad-srv01 nomad[58961]: ==> Newer Nomad version available: 1.7.3 (currently running: 1.6.1)
Jan 20 01:11:14 santis-nomad-srv01 nomad[58961]:     2024-01-20T01:11:14.220+0100 [ERROR] http: request failed: method=PUT path=/v1/acl/bootstrap error="No cluster leader" code=500

Any idea why?

Without the nomad configuration, it’s hard to tell, but it looks like they’re trying to look up the service in Consul, which isn’t responding. Is Consul working on the cluster, or is perhaps ACL enabled on the Consul cluster and the Nomad cluster can’t authenticate?

Hi, No, consul installed on the machines.
Nomad config files:

# cat /etc/nomad.d/nomad.hcl
log_level = "DEBUG"
region = "global"
datacenter = "santis"

#consul {
#  address             = "127.0.0.1:8500"
#  server_service_name = "nomad"
#  client_service_name = "nomad-client"
#  auto_advertise      = true
#  server_auto_join    = true
#  client_auto_join    = true
#  ssl = true
#  ca_file   = "/root/consul-agent-ca.pem"
#  cert_file = "/root/santis-server-consul-0.pem"
#  key_file  = "/root/santis-server-consul-0-key.pem"
#  verify_ssl = true
#}

#tls {
#  http = true
#  rpc  = true
#
#  ca_file   = "/root/nomad-agent-ca.pem"
#  cert_file = "/root/global-server-nomad.pem"
#  key_file  = "/root/global-server-nomad-key.pem"
#
#  verify_server_hostname = true
#  verify_https_client    = true
#}

acl {
  enabled = true
}

and

# cat /etc/nomad.d/server.hcl
# use permanent storage
data_dir = "/scratch/shared/nomad_server/santis-nomad-srv01"

server {
  enabled = true
  bootstrap_expect = "3"
  server_join {
    retry_join = ["santis-nomad-srv01", "santis-nomad-srv02", "santis-nomad-srv03"]
  }
}

advertise {
  http = "santis-nomad-srv01:4646"
  rpc  = "santis-nomad-srv01:4647"
  serf = "santis-nomad-srv01:4648"
}

tls {
  http = true
  rpc  = true

  ca_file   = "/scratch/shared/nomad_server/agent-certs/server.ca.crt"
  cert_file = "/scratch/shared/nomad_server/agent-certs/server.global.crt"
  key_file  = "/scratch/shared/nomad_server/agent-certs/server.global.key"

  verify_server_hostname = true
  verify_https_client    = true
}