Nomad is unable to connect to Consul

I’ve got Nomad and Consul running in a mini cluster of one server (server), and two clients (client[1-2]). Recently I upgraded Nomad and Consul to v1.9.6 and v1.20.2 respectively. Nomad is unable to connect to Consul after this upgrade. Deployments are failing with the following error:

Constraint ${attr.consul.version} semver >= 1.8.0 filtered 2 nodes

Nomad logs from one of the clients follows:

Feb 13 11:27:04 client1 systemd[1]: Started nomad.service - Nomad Startup process.
Feb 13 11:27:05 client1 bash[730]: ==> WARNING: mTLS is not configured - Nomad is not secure without mTLS!
Feb 13 11:27:05 client1 bash[730]: ==> Loaded configuration from /etc/nomad.d/nomad.hcl
Feb 13 11:27:05 client1 bash[730]: ==> Starting Nomad agent...
Feb 13 11:27:11 client1 bash[730]: ==> Nomad agent configuration:
Feb 13 11:27:11 client1 bash[730]:        Advertise Addrs: HTTP: 172.20.20.21:4646
Feb 13 11:27:11 client1 bash[730]:             Bind Addrs: HTTP: [172.20.20.21:4646]
Feb 13 11:27:11 client1 bash[730]:                 Client: true
Feb 13 11:27:11 client1 bash[730]:              Log Level: INFO
Feb 13 11:27:11 client1 bash[730]:                 Region: global (DC: dc-local)
Feb 13 11:27:11 client1 bash[730]:                 Server: false
Feb 13 11:27:11 client1 bash[730]:                Version: 1.9.6
Feb 13 11:27:11 client1 bash[730]: ==> Nomad agent started! Log data will stream in below:
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.136+0530 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/var/nomad/plugins
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.151+0530 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.151+0530 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.151+0530 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.151+0530 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.151+0530 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.152+0530 [INFO]  client: using state directory: state_dir=/var/nomad/client
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.154+0530 [INFO]  client: using alloc directory: alloc_dir=/var/nomad/alloc
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.154+0530 [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.347+0530 [WARN]  client.fingerprint_mgr: failed to detect bridge kernel module, bridge network mode disabled:
Feb 13 11:27:11 client1 bash[730]:   error=
Feb 13 11:27:11 client1 bash[730]:   | 4 errors occurred:
Feb 13 11:27:11 client1 bash[730]:   | \t* failed to find /sys/module/bridge: stat /sys/module/bridge: no such file or directory
Feb 13 11:27:11 client1 bash[730]:   | \t* module bridge not in /proc/modules
Feb 13 11:27:11 client1 bash[730]:   | \t* module bridge not in /lib/modules/6.8.0-31-generic/modules.builtin
Feb 13 11:27:11 client1 bash[730]:   | \t* module bridge not in /lib/modules/6.8.0-31-generic/modules.dep
Feb 13 11:27:11 client1 bash[730]:   |
Feb 13 11:27:11 client1 bash[730]:   
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.348+0530 [WARN]  client.fingerprint_mgr.consul: failed to acquire consul self endpoint: cluster=default error="Get \"http://127.0.0.1:8500/v1/agent/self\": dial tcp 127.0.0.1:8500: connect: connection refused"
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.366+0530 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:05.375+0530 [WARN]  client.fingerprint_mgr.cni_plugins: failed to read CNI plugins directory: cni_path=/opt/cni/bin error="open /opt/cni/bin: no such file or directory"
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:11.517+0530 [INFO]  client.proclib.cg2: initializing nomad cgroups: cores=0-1
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:11.518+0530 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:11.528+0530 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:11.528+0530 [INFO]  client.plugin: starting plugin manager: plugin-type=device
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:11.565+0530 [INFO]  client: started client: node_id=d0f705da-b1d2-02c1-ea4e-25db5f6ba5a2
Feb 13 11:27:11 client1 bash[730]:     2025-02-13T11:27:11.582+0530 [INFO]  client: node registration complete
Feb 13 11:27:17 client1 bash[730]:     2025-02-13T11:27:17.053+0530 [INFO]  client: node registration complete
Feb 13 15:07:33 client1 bash[730]:     2025-02-13T15:07:33.772+0530 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=49e652a0-ca4b-6534-cba1-7f8ff80a844c task=fabio type=Received msg="Task received by client" failed=false
Feb 13 15:07:33 client1 bash[730]:     2025-02-13T15:07:33.777+0530 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=49e652a0-ca4b-6534-cba1-7f8ff80a844c task=fabio type="Task Setup" msg="Building Task Directory" failed=false
Feb 13 15:07:33 client1 bash[730]:     2025-02-13T15:07:33.846+0530 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=49e652a0-ca4b-6534-cba1-7f8ff80a844c task=fabio type=Driver msg="Downloading image" failed=false
Feb 13 15:07:43 client1 bash[730]:     2025-02-13T15:07:43.951+0530 [INFO]  client.driver_mgr.docker: created container: driver=docker container_id=eb8ed14ecb2e688dd2730faf27634df7cc1f3635c494046fcc37994d610ed026
Feb 13 15:07:44 client1 bash[730]:     2025-02-13T15:07:44.113+0530 [INFO]  client.driver_mgr.docker: started container: driver=docker container_id=eb8ed14ecb2e688dd2730faf27634df7cc1f3635c494046fcc37994d610ed026
Feb 13 15:07:44 client1 bash[730]:     2025-02-13T15:07:44.174+0530 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=49e652a0-ca4b-6534-cba1-7f8ff80a844c task=fabio type=Started msg="Task started by client" failed=false

Other files of interest from the same machine (client1) follow.

nomad.hcl

# Ref: https://learn.hashicorp.com/tutorials/nomad/clustering
# NOTE: `ansible_eth1` because `eth1` is the name of the interface created via Vagrant
bind_addr  = "172.20.20.21"
data_dir   = "/var/nomad"
datacenter = "dc-local"

client {
  enabled           = true
  # // @formatter:off
  servers           = ["172.20.20.10"]
  # // @formatter:on
  # Ref: https://discuss.hashicorp.com/t/internal-routing-problem/34201/5
  # Ref: https://www.nomadproject.io/docs/configuration/client#network_interface
  network_interface = "eth1"
}

consul.hcl:

# NOTE: `ansible_eth1` because `eth1` is the name of the interface created via Vagrant
bind_addr            = "172.20.20.21"
data_dir             = "/var/consul"
datacenter           = "dc-local"
enable_script_checks = true
enable_syslog        = true
leave_on_terminate   = true
log_level            = "DEBUG"
node_name            = "client1"
# // @formatter:off
retry_join           = ["172.20.20.10", "172.20.20.21", "172.20.20.22"]
# // @formatter:on

# // @formatter:off
# // @formatter:on

Any ideas on how to troubleshoot this?

perhaps this problem is because is there problem with module bridge installed in kernel of the node1.

module bridge is a requirement of the latest nomad version

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.