No connection through Consul Connect (Envoy)

Summary: I’m trying to run run demo project from Consul Service Mesh | Nomad by HashiCorp when I’m running Consul and Nomad in dev mode everything work as expected, but when I’m running manually configured clusters of Consul and Nomad I can’t get connection between dashboard and api to work through Envoy sidecar proxy.

My setup is 4 VirtualBox VMs started by Vagrant, 3 in server mode and one in client mode. Cluster configuration for Consul and Nomad is minimal.

Nomad configuration
# nomad.json
{
  "acl": {
    "enabled": false
  },
  "advertise": {
    "http": "192.168.150.18",
    "rpc": "192.168.150.18",
    "serf": "192.168.150.18"
  },
  "bind_addr": "0.0.0.0",
  "client": {
    "cni_path": "/opt/cni/bin",
    "enabled": true,
    "network_interface": "eth1",
    "servers": [
      "192.168.150.10",
      "192.168.150.13",
      "192.168.150.17"
    ]
  },
  "data_dir": "/var/lib/nomad/data",
  "log_level": "INFO",
  "server": {
    "enabled": false
  },
  "telemetry": {}
}
Consul configuration
# consul.json
{
  "acl": {
    "default_policy": "deny",
    "enabled": false
  },
  "addresses": {
    "grpc": "127.0.0.1 192.168.150.18",
    "http": "127.0.0.1 192.168.150.18"
  },
  "advertise_addr": "192.168.150.18",
  "bind_addr": "0.0.0.0",
  "bootstrap_expect": 3,
  "connect": {
    "ca_provider": "consul",
    "enabled": true
  },
  "data_dir": "/var/lib/consul/data",
  "log_level": "info",
  "ports": {
    "grpc": 8502,
    "http": 8500
  },
  "retry_join": [
    "192.168.150.18",
    "192.168.150.17",
    "192.168.150.13"
  ],
  "server": true,
  "start_join": [
    "192.168.150.18",
    "192.168.150.17",
    "192.168.150.13"
  ],
  "telemetry": {
    "disable_hostname": true,
    "prometheus_retention_time": "12h"
  },
  "ui_config": {
    "enabled": true
  }
}
Network
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:39:46:0b brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
       valid_lft 79962sec preferred_lft 79962sec
    inet6 fe80::a00:27ff:fe39:460b/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:51:d3:4b brd ff:ff:ff:ff:ff:ff
    altname enp0s8
    inet 192.168.150.18/24 brd 192.168.150.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe51:d34b/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:3e:43:dd:80 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: nomad: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a2:b2:df:2a:75:e9 brd ff:ff:ff:ff:ff:ff
    inet 172.26.64.1/20 brd 172.26.79.255 scope global nomad
       valid_lft forever preferred_lft forever
    inet6 fe80::a0b2:dfff:fe2a:75e9/64 scope link
       valid_lft forever preferred_lft forever
8: veth23de8e16@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default
    link/ether fe:03:75:e5:03:c5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::a01a:64ff:fea2:ca58/64 scope link
       valid_lft forever preferred_lft forever
9: veth09f0d0f6@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default
    link/ether fa:1c:8a:d5:39:5e brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::3424:f0ff:fe6f:c456/64 scope link
       valid_lft forever preferred_lft forever

I can provide more information if required.

All checks in Consul Services section for countdash and api is green, no errors in Envoy logs but dashboard show error Counting Service is Unreachable. I have some experience with Consul but I’m new to Nomad, Envoy and CNI so I have some troubles debugging this issue. Due fact demo is working in dev mode, it feels like I’m missing something small but crucial. I think it’s something related to network configuration, but I have no idea is it related to Consul or to Nomad.

Any suggestions how to debug this issue would be greatly appreciated.

Unsure if related but we run client nodes with centos7.

When we added rockylinux8 (rhel8 derivatives) client nodes we discovered connection issues and found out that related to nftables more configuration is needed so containers can talk accross VMs. linux - No network connectivity to/from Docker CE container on CentOS 8 - Server Fault

@resmo thanks for reply but I don’t think iptables / nftables is an issue in my case, I’m running Debian so there is no firewalld and overall firewall rules mostly permissive. Also currently I’m running this demo on single VM especially to prevent possible cross VM connection issues.

I think I found root of the issue

[2022-04-20 13:02:10.000][15][debug][conn_handler] [source/server/active_tcp_listener.cc:140] [C391] new connection from 172.26.64.1:44526
[2022-04-20 13:02:10.000][15][debug][connection] [source/common/network/connection_impl.cc:672] [C392] connected
[2022-04-20 13:02:10.000][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:293] [C392] attaching to next stream
[2022-04-20 13:02:10.000][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:176] [C392] creating stream
[2022-04-20 13:02:10.000][15][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:602] [C391] TCP:onUpstreamEvent(), requestedServerName: 
[2022-04-20 13:02:10.002][15][debug][rbac] [source/extensions/filters/network/rbac/rbac_filter.cc:48] checking connection: requestedServerName: , sourceIP: 172.26.64.1:44526, directRemoteIP: 172.26.64.1:44526,remoteIP: 172.26.64.1:44526, localAddress: 172.26.64.27:23188, ssl: uriSanPeerCertificate: spiffe://72157773-f1d2-2957-fd27-44aa979e87a1.consul/ns/default/dc/dc1/svc/count-dashboard, dnsSanPeerCertificate: , subjectPeerCertificate: , dynamicMetadata: 
[2022-04-20 13:02:10.002][15][debug][rbac] [source/extensions/filters/network/rbac/rbac_filter.cc:126] enforced denied, matched policy none
[2022-04-20 13:02:10.002][15][debug][connection] [source/common/network/connection_impl.cc:138] [C391] closing data_to_write=0 type=1
[2022-04-20 13:02:10.002][15][debug][connection] [source/common/network/connection_impl.cc:249] [C391] closing socket: 1
[2022-04-20 13:02:10.002][15][debug][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:304] [C391] SSL shutdown: rc=0
[2022-04-20 13:02:10.002][15][debug][connection] [source/common/network/connection_impl.cc:138] [C392] closing data_to_write=0 type=1
[2022-04-20 13:02:10.002][15][debug][connection] [source/common/network/connection_impl.cc:249] [C392] closing socket: 1

And also found the solution. There was no intentions defined

consul config list -kind service-intentions
# empty output

And although Consul UI declare that I have “Permissive Intention”

One or more of your Intentions are set to allow traffic to and/or from all other services in a namespace. This Topology view will show all of those connections if that remains unchanged. We recommend setting more specific Intentions for upstream and downstream services to make this vizualization more useful.

Still connection was prohibited.

So I created ‘allow all’ intention by hand

allow.json

{
    "Kind": "service-intentions",
    "Name": "*",
    "Sources": [
        {
            "Name": "*",
            "Action": "allow",
            "Description": "Allow all"
        }
    ]
}
consul config write allow.json
Config entry written: service-intentions/*

consul config list -kind service-intentions
*

After that dashboard successfully connected to api.

I think in Consul Service Mesh | Nomad by HashiCorp tutorial it’s need to be mentioned that in case when Cousul running not in dev mode intentions must be configured manually that will save a lot of time to newcomers.

Some more details to clarify why I’m faced this issue. It’s not anything wrong with tutorial I just shoot to my own foot :laughing:
I still think it worth to mention in tutorial about relationship between acl:default_policy in Consul agent configuration and default applied intentions policy.

I have

{
  "acl": {
    "default_policy": "deny",
    "enabled": false
...

in my Consul configuration (default is default_policy: allow ) and that’s why I need to manually create ‘allow all’ intention. With default acl configuration, I mean when it’s not configured at all it’s not required to manually create any intentions.

Reference:

Consul Intentions

The default intention behavior is defined by the default_policy configuration. If the configuration is set allow , then all service mesh Connect connections will be allowed by default. If is set to deny , then all connections or requests will be denied by default.

Consul Agent options

default_policy - Either “allow” or “deny”; defaults to “allow”