I have a setup with one Nomad server and one client. They are connected using VPC on 10.10.10.0/24. There are also Consul server/client on same servers.
Nomad client is failing to connect to server because it tries to connect on 10.18.0.8 net for some reason.
It is only the Nomad servers that connect and share state via Raft and the log output you have attached indicates the server has found configuration of another server to connect with. Is it possible that the Nomad server data directory has stale data within it from previous configurations and is using that to attempt to find other servers to talk to? If so, I would suggest removing the entire Nomad server data directory and starting the agent again.
Nomad client is failing to connect to server
The log output you have included only comes from the Nomad server, do you have logs from the client that indicate a failure to connect with the server? If you’re able to share the configuration for both the server and client, that would also be useful to help identify any potential problems.
@jrasell thanks for reply
Here’s client’s log sample:
==> Nomad agent configuration:
Advertise Addrs: HTTP: 10.10.10.3:4646
Bind Addrs: HTTP: [10.10.10.3:4646]
Client: true
Log Level: INFO
Region: global (DC: dc1)
Server: false
Version: 1.4.3
==> Nomad agent started! Log data will stream in below:
2022-12-15T07:10:57.040Z [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/opt/nomad/plugins
2022-12-15T07:10:57.042Z [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
2022-12-15T07:10:57.042Z [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
2022-12-15T07:10:57.042Z [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0
2022-12-15T07:10:57.042Z [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0
2022-12-15T07:10:57.042Z [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
2022-12-15T07:10:57.043Z [INFO] client: using state directory: state_dir=/opt/nomad/client
2022-12-15T07:10:57.043Z [INFO] client: using alloc directory: alloc_dir=/opt/nomad/alloc
2022-12-15T07:10:57.043Z [INFO] client: using dynamic ports: min=20000 max=32000 reserved=""
2022-12-15T07:10:57.053Z [INFO] client.fingerprint_mgr.cgroup: cgroups are available
2022-12-15T07:10:57.057Z [INFO] client.fingerprint_mgr.consul: consul agent is available
2022-12-15T07:10:57.061Z [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0
2022-12-15T07:10:57.062Z [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
2022-12-15T07:10:57.066Z [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0
2022-12-15T07:10:57.072Z [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth1
2022-12-15T07:10:57.114Z [WARN] client.fingerprint_mgr.env_digitalocean: failed to read attribute: attribute=private-ipv6 error="error reading attribute interfaces/private/0/ipv6/address. digitalocean metadata api returned an error: resp_code: 404, resp_body: not found"
2022-12-15T07:10:57.124Z [WARN] client.fingerprint_mgr.env_digitalocean: failed to read attribute: attribute=public-ipv6 error="error reading attribute interfaces/public/0/ipv6/address. digitalocean metadata api returned an error: resp_code: 404, resp_body: not found"
2022-12-15T07:10:57.202Z [INFO] client.plugin: starting plugin manager: plugin-type=csi
2022-12-15T07:10:57.202Z [INFO] client.plugin: starting plugin manager: plugin-type=driver
2022-12-15T07:10:57.202Z [INFO] client.plugin: starting plugin manager: plugin-type=device
2022-12-15T07:10:57.204Z [INFO] client: started client: node_id=f38ca24e-f289-f8f0-2001-8baf4a5cb27f
2022-12-15T07:10:57.204Z [WARN] client.server_mgr: no servers available
2022-12-15T07:10:57.204Z [WARN] client.server_mgr: no servers available
2022-12-15T07:10:57.212Z [INFO] client.consul: discovered following servers: servers=[10.18.0.8:4647]
2022-12-15T07:11:00.261Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: failed to get conn: rpc error: lead thread didn't get connection" rpc=Node.GetClientAllocs server=10.18.0.8:4647
2022-12-15T07:11:00.261Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: failed to get conn: rpc error: lead thread didn't get connection" rpc=Node.GetClientAllocs server=10.18.0.8:4647
I just cannot understand how it keeps advertising 10.18.0.8 when it’s not even default and was never configured for nomad, feels like it picks some net interface independently, but idk
Here’s my routes:
# ip r
default via 206.189.0.1 dev eth0 proto static
10.10.10.0/24 dev eth1 proto kernel scope link src 10.10.10.2
10.18.0.0/16 dev eth0 proto kernel scope link src 10.18.0.8
206.189.0.0/20 dev eth0 proto kernel scope link src 206.189.8.128