Nomad server/client can't connect to each other under Consul Connect

Hello,

I’m quite new to Consul/Nomad. I try to create a cluster of 3 server/client nodes.
I know having the same server for running Consul and Nomad in client and server is not good and recommanded. But here, I’m trying to set up that before going deeper.

I succeed to make a cluster of 3 Consul nodes. They appear to see each other.
But, if I start a service (Nomad here) on the first server, it appear on the ui of the first Consul but not in the ui of the other Consul and my both Nomad server/client don’t communicate each other.

To simplify: in the “services” page I only see the Nomad server/client of the one running on the same host of the Consul UI I’m looking. The other Consul UI of the other host don’t see the Nomad server/client running in the first host.

My 3 servers are in a private VPC with CIDR : 10.114.16.0/20.

My Consul config is:

datacenter = "fra1"
domain = "consul"

data_dir = "/opt/consul"

client_addr = "0.0.0.0"

bind_addr = "{{ GetPrivateInterfaces | include \"network\" \"10.114.16.0/20\" | attr \"address\" }}"

ui = true

server = true
bootstrap_expect = 3

encrypt = "..."

retry_join = ["provider=digitalocean region=fra1 tag_name=consul-server api_token=..."]

ports {
    grpc = 8502
}

connect {
    enabled = true
}

My Nomad config is:

datacenter = "fra1"

data_dir = "/opt/nomad"

bind_addr = "0.0.0.0"

server {
    enabled = true
    bootstrap_expect = 3
}

client {
    enabled = true
}

I installed Unbound, following that : https://learn.hashicorp.com/tutorials/consul/dns-forwarding?in=consul/security-networking#unbound-setup
I also follow that to connect Nomad to Consul : https://www.nomadproject.io/docs/integrations/consul-connect

I am running an CentOS 8 with firewalld installed, opened ports are:

Consul:
    {8300,8301,8302,8400,8500,8501,8502,8600}/tcp
    {8301,8302,8600}/udp
Nomad:
    {4646,4647,4648}/tcp
    4648/udp

Is anyone has an idea ? :slight_smile:

Thanks !

I found one issue: I secured the GOSSIP communication (https://learn.hashicorp.com/tutorials/consul/gossip-encryption-secure?in=consul/security-networking) but without setting thoses values:

encrypt_verify_incoming = true
encrypt_verify_outgoing = true

When I set those values and restart Nomad client/server and Consul server, I saw all Nomad client/server through one Consul UI.


But for now, the Nomad server / client don’t see each others, the client list only contains the client running on the same host and all the other server have status set to “failed”.

I follow this to connect Nomad server/client to each other (https://learn.hashicorp.com/tutorials/nomad/clustering?in=nomad/manage-clusters)
It seems that the only one operation which success is the manual but I would like to use the Consul option. :slight_smile:

EDIT 1 : I did completely remove firewalld without success.

EIDT 2: While trying to start Nomad with bootstrap_expect = 3 I see that :

2020-10-27T09:03:55.160Z [INFO]  nomad.raft: entering candidate state: node="Node at 10.19.0.5:4647 [Candidate]" term=141
2020-10-27T09:03:55.881Z [ERROR] nomad.raft: failed to make requestVote RPC: target="{Voter 10.19.0.6:4647 10.19.0.6:4647}" error="dial tcp 10.19.0.6:4647: connect: no route to host"
2020-10-27T09:03:57.052Z [WARN]  nomad.raft: Election timeout reached, restarting election

EDIT 3: I don’t understand the ip 10.19.0.5 since it should be 10.114.16.x.
If it help: on Consul the Nomad Server HTTP Check failed (maybe because it don’t start because of the error of election ?)

EDIT 4: If I try to add the retry_join in the nomad server like I did and like I succeed to do for Consul, the same error appear: failed to make request.

EDIT 5: While I type : host -t A consul.service.consul I got : Host consul.service.consul not found: 3(NXDOMAIN)… It’s not normal right ?

EDIT 6: I succeed to make the reverse DNS active. Now host consul.service.consul gives me the 3 nodes. But Nomad is still failing.

EDIT 7: I deleted /opt/nomad folder and now I got this error while two Nomad server tries to connect through Consul Connect :

2020-10-27T12:29:06.698Z [ERROR] nomad: error looking up Nomad servers in Consul: error="contacted 0 Nomad Servers: 1 error occurred:
    * Failed to join 10.19.0.6: dial tcp 10.19.0.6:4648: connect: no route to host"

And quite the same errors appear on the other host but with 10.19.0.5

If I run ip addr I got:

[root@server-01 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether ee:d6:fc:6d:44:69 brd ff:ff:ff:ff:ff:ff
    inet 46.101.195.164/18 brd 46.101.255.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 10.19.0.5/16 brd 10.19.255.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ecd6:fcff:fe6d:4469/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 62:d4:15:35:64:ac brd ff:ff:ff:ff:ff:ff
    inet 10.114.16.2/20 brd 10.114.31.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::60d4:15ff:fe35:64ac/64 scope link 
       valid_lft forever preferred_lft forever

So finally I see that 10.114.16.2 is quite “linked” to 10.19.0.5. But this is not the right ip address I want to use here. How can I say to Nomad server to use ip address instead of eth1.

I think I succeed to make Nomad server/client talk to each others. I change the bind_addr to use "{{ GetPrivateInterfaces | include \"network\" \"10.114.16.0/20\" | attr \"address\" }}". Fixing the Forward DNS, the GOSSIP communication between Consul servers and the bind addr seems to be enough.

But… I cannot longer access to the Nomad UI now… :frowning:
For Consul I set client_addr = "0.0.0.0" which seems to be enough but there’s not that option for Nomad.

With a simple reverse-proxy I succeed to make Nomad UI available : https://learn.hashicorp.com/tutorials/nomad/reverse-proxy-ui?in=nomad/manage-clusters.
I also did the same thing for Consul to make the UI available through the reverse-proxy. Maybe I will set Boundary to restrict Consul and Nomad access… :slight_smile:

This issue is closed.