Summary: I’m trying to run run demo project from Consul Service Mesh | Nomad by HashiCorp when I’m running Consul and Nomad in dev mode everything work as expected, but when I’m running manually configured clusters of Consul and Nomad I can’t get connection between dashboard and api to work through Envoy sidecar proxy.
My setup is 4 VirtualBox VMs started by Vagrant, 3 in server mode and one in client mode. Cluster configuration for Consul and Nomad is minimal.
Nomad configuration
# nomad.json
{
"acl": {
"enabled": false
},
"advertise": {
"http": "192.168.150.18",
"rpc": "192.168.150.18",
"serf": "192.168.150.18"
},
"bind_addr": "0.0.0.0",
"client": {
"cni_path": "/opt/cni/bin",
"enabled": true,
"network_interface": "eth1",
"servers": [
"192.168.150.10",
"192.168.150.13",
"192.168.150.17"
]
},
"data_dir": "/var/lib/nomad/data",
"log_level": "INFO",
"server": {
"enabled": false
},
"telemetry": {}
}
Consul configuration
# consul.json
{
"acl": {
"default_policy": "deny",
"enabled": false
},
"addresses": {
"grpc": "127.0.0.1 192.168.150.18",
"http": "127.0.0.1 192.168.150.18"
},
"advertise_addr": "192.168.150.18",
"bind_addr": "0.0.0.0",
"bootstrap_expect": 3,
"connect": {
"ca_provider": "consul",
"enabled": true
},
"data_dir": "/var/lib/consul/data",
"log_level": "info",
"ports": {
"grpc": 8502,
"http": 8500
},
"retry_join": [
"192.168.150.18",
"192.168.150.17",
"192.168.150.13"
],
"server": true,
"start_join": [
"192.168.150.18",
"192.168.150.17",
"192.168.150.13"
],
"telemetry": {
"disable_hostname": true,
"prometheus_retention_time": "12h"
},
"ui_config": {
"enabled": true
}
}
Network
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:39:46:0b brd ff:ff:ff:ff:ff:ff
altname enp0s3
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
valid_lft 79962sec preferred_lft 79962sec
inet6 fe80::a00:27ff:fe39:460b/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:51:d3:4b brd ff:ff:ff:ff:ff:ff
altname enp0s8
inet 192.168.150.18/24 brd 192.168.150.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe51:d34b/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:3e:43:dd:80 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
5: nomad: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether a2:b2:df:2a:75:e9 brd ff:ff:ff:ff:ff:ff
inet 172.26.64.1/20 brd 172.26.79.255 scope global nomad
valid_lft forever preferred_lft forever
inet6 fe80::a0b2:dfff:fe2a:75e9/64 scope link
valid_lft forever preferred_lft forever
8: veth23de8e16@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default
link/ether fe:03:75:e5:03:c5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::a01a:64ff:fea2:ca58/64 scope link
valid_lft forever preferred_lft forever
9: veth09f0d0f6@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default
link/ether fa:1c:8a:d5:39:5e brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::3424:f0ff:fe6f:c456/64 scope link
valid_lft forever preferred_lft forever
I can provide more information if required.
All checks in Consul Services section for countdash and api is green, no errors in Envoy logs but dashboard show error Counting Service is Unreachable
. I have some experience with Consul but I’m new to Nomad, Envoy and CNI so I have some troubles debugging this issue. Due fact demo is working in dev mode, it feels like I’m missing something small but crucial. I think it’s something related to network configuration, but I have no idea is it related to Consul or to Nomad.
Any suggestions how to debug this issue would be greatly appreciated.