Hi, I recently wanted to use nomad clustering on two separate servers, one running a nomad server agent and one using a nomad client agent.
So I basically started following the example from this part of the nomad getting started clustering section:
However I’ve found that it doesn’t seem to actually work (?) I’m able to start the server.hcl and it runs fine, and the client1.hcl starts up fin and gets to a ready
state. But then it ends up having a heartbeat error and degrades to a down
state. I see these errors in the client:
2020-11-12T03:38:59.074Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: failed to get conn: dial tcp 10.10.0.5:4647: connect: connection refused" rpc=Node.Register server=10.10.0.5:4647
2020-11-12T03:38:59.074Z [ERROR] client: error registering: error="rpc error: failed to get conn: dial tcp 10.10.0.5:4647: connect: connection refused"
2020-11-12T03:39:03.249Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: failed to get conn: dial tcp 10.10.0.5:4647: connect: connection refused" rpc=Node.UpdateStatus server=10.10.0.5:4647
2020-11-12T03:39:03.249Z [ERROR] client: error heartbeating. retrying: error="failed to update status: rpc error: failed to get conn: dial tcp 10.10.0.5:4647: connect: connection refused" period=1.238717384s
2020-11-12T03:39:03.250Z [ERROR] client: error discovering nomad servers: error="client.consul: unable to query Consul datacenters: Get "http://127.0.0.1:8500/v1/catalog/datacenters": dial tcp 127.0.0.1:8500: connect: connection refused"
I’m not sure what has caused it to fail on heartbeats but I’ve retried all the steps several times and it happens consistently. If anyone else has run into this and figured it out, I would really appreciate it if you could tell me how you got around this issue.