Nomad client: error discovering nomad servers using Consul agent, ACL issue?


I’ve got 2 small physical machines for learning and development. The first machine runs Nomad server, Nomad client, Consul server, and Vault server. The second machine runs a Nomad client and Consul agent. Consul is v1.9.3, and Nomad is v1.0.4.

After getting everything basically functional, running Nomad jobs and seeing them in Consul, I’ve recently tried enabling ACLs. I think there may be something missing in my config or polices somewhere.

On the second machine, I have been following the Learn guidance to use the Consul agent to get the Nomad client joined to the cluster. This was working OK. However, after enabling ACLs and creating a token and policy for the second machine, its Nomad client remains down according to the Nomad server UI, and the Nomad client logs show

[ERROR] http: request failed: method=GET path=/v1/agent/health?type=client error="{"client":{"ok":false,"message":"no known servers"}}" code=500
[WARN]  client.server_mgr: no servers available
[ERROR] client: error discovering nomad servers: error="no Nomad Servers advertising service "nomad" in Consul datacenters: ["mydatacenter"]"

On the same (second) machine, however, I can see the Nomad server service listed:

$ consul catalog services -tags
nomad             http,rpc,serf
nomad-client      http
redis-cache       cache,global
vault             active,initialized

My Nomad client configuration on the second machine looks like this:

$ cat /etc/nomad-client.hcl 
datacenter = "mydatacenter"
data_dir = "/var/lib/nomad-client"

client {
  enabled = true
  # host volumes omitted for brevity

ports {
  http = 4746
  rpc  = 4747
  serf = 4748

The first machine’s Consul Server UI shows a Nomad client service for the second machine, and that service even has a healthy status for serf, but unhealthy for http. The http health check shows

HTTP GET 500 Internal Server Error Output: {"client":{"ok":false,"message":"no known servers"}}

There are a lot of moving parts in getting this combination of servers and clients bootstrapped with ACLs. What might I be missing?