Communication between consul on local VM and consul in Kubernetes

Hi,

I have a kubernetes cluster with 2 master nodes and 3 worker nodes. I have used HELM to install consul setup which has 3 consul-servers and 5 consul-clients running.

here is how the consul server pods and consul client pods are placed on the Kubernetes nodes:

[root@k8masterg2m1 autoinstall]# kubectl get po -o wide | grep consul
consul-consul-4lxtr 1/1 Running 0 103m 192.168.139.139 k8masterg2m1
consul-consul-6wv9w 1/1 Running 0 103m 192.168.118.215 k8workerg2w3
consul-consul-pc562 1/1 Running 0 103m 192.168.108.162 k8workerg2w2
consul-consul-server-0 1/1 Running 0 107m 192.168.118.214 k8workerg2w3
consul-consul-server-1 1/1 Running 0 9m15s 192.168.227.91 k8workerg2w1
consul-consul-server-2 1/1 Running 0 107m 192.168.108.161 k8workerg2w2
consul-consul-tg4kz 1/1 Running 0 103m 192.168.139.72 k8masterg2m2
consul-consul-tj7h5 1/1 Running 0 103m 192.168.227.90 k8workerg2w1

On the other side I have installed consul client on a local VM, which is on the same networks as the Kubernetes nodes.

From the consul server pods running in Kubernetes, I have used the below command to join the local VM(10.0.20.102).

/ # consul join 10.0.20.102
Successfully joined cluster by contacting 1 nodes.

I could see the below output in both the VM and consul pods in the Kubernetes:

/ # consul members
Node Address Status Type Build Protocol DC Segment
consul-consul-server-0 192.168.118.214:8301 alive server 1.8.1 2 dc1
consul-consul-server-1 192.168.227.91:8301 alive server 1.8.1 2 dc1
consul-consul-server-2 192.168.108.161:8301 alive server 1.8.1 2 dc1
k8masterg1m2 10.0.20.102:8301 alive client 1.8.1 2 dc1
k8masterg2m1 192.168.139.139:8301 alive client 1.8.1 2 dc1
k8masterg2m2 192.168.139.72:8301 alive client 1.8.1 2 dc1
k8workerg2w1 192.168.227.90:8301 alive client 1.8.1 2 dc1
k8workerg2w2 192.168.108.162:8301 alive client 1.8.1 2 dc1
k8workerg2w3 192.168.118.215:8301 alive client 1.8.1 2 dc1

Now, when I try to list the services in Kubernetes consul pods it works fine as shown below:
/ # consul catalog services
consul
consul-consul-dns-default
consul-consul-server-default
consul-consul-ui-default
ha-rabbitmq-rabbitmq-ha-default
ha-rabbitmq-rabbitmq-ha-discovery-default
kubernetes-default
vault-agent-injector-svc-default
vault-internal-default

but, when I try to run the same command in local VM it gives the below error:

[root@k8masterg1m2 autoinstall]# consul catalog services
Error listing services: Unexpected response code: 500 (rpc error getting client: failed to get conn: rpc error: lead thread didn’t get connection)

Since on the consul agent running in the local VM, it is able to list the members but not services/nodes.
Is this the expected behavior or is there any other configuration which has to be done to get this work.

Also, I wanted to know how the communication happens between consul servers and consul agent which is outside Kubernetes cluster.

Any help is appreciated.

Thanks in Advance!!

Hey @banicr

You might need to double-check your network connectivity and the logs of the consul client running on a VM.

For a consul client on a VM to join cluster on Kubernetes, we currently require connectivity over pod IPs between VMs and the Kube cluster. Please see these docs.

hi @ishustava

Thanks for your reply.

I have gone through the consul document where it lead me to execute the below command to auto join consul client which is running on VM to consul on Kubernetes.

command:

consul agent -retry-join ‘provider=k8s label_selector=“app=consul,component=server”’ -data-dir=/var/lib/consul/ -log-level ‘debug’

This command gives the output like:
==> Multiple private IPv4 addresses found. Please configure one with ‘bind’ and/or ‘advertise’.

So, I have added advertise to the above command and executed.

#consul agent -retry-join ‘provider=k8s label_selector=“app=consul,component=server”’ -data-dir=/var/lib/consul/ -log-level ‘debug’ -advertise=10.0.20.103

Note: here -advertise=IP of the VM which I am trying to join to consul in Kubernetes

the output I got is shown below:

           Version: '1.8.2'
           Node ID: 'e4f53ac0-96df-0d37-5fb3-4c21ddc896ff'
         Node name: 'k8workerg1w1'
        Datacenter: 'dc1' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
      Cluster Addr: 10.0.20.103 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2020-08-11T16:00:02.355+0530 [INFO]  agent.client.serf.lan: serf: EventMemberJoin: k8workerg1w1 10.0.20.103
    2020-08-11T16:00:02.356+0530 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
    2020-08-11T16:00:02.356+0530 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
    2020-08-11T16:00:02.356+0530 [INFO]  agent: Started HTTP server: address=127.0.0.1:8500 network=tcp
    2020-08-11T16:00:02.356+0530 [INFO]  agent: started state syncer
==> Consul agent running!
    2020-08-11T16:00:02.356+0530 [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-08-11T16:00:02.357+0530 [INFO]  agent: Joining cluster...: cluster=LAN
    2020-08-11T16:00:02.357+0530 [DEBUG] agent: discover: Using provider "k8s": cluster=LAN
    2020-08-11T16:00:02.357+0530 [WARN]  agent.client.manager: No servers available
    2020-08-11T16:00:02.357+0530 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    2020-08-11T16:00:02.395+0530 [INFO]  agent: Discovered servers: cluster=LAN cluster=LAN servers="192.168.118.214 192.168.227.91 192.168.108.161"
    2020-08-11T16:00:02.396+0530 [INFO]  agent: (LAN) joining: lan_addresses=[192.168.118.214, 192.168.227.91, 192.168.108.161]
    2020-08-11T16:00:12.396+0530 [DEBUG] agent.client.memberlist.lan: memberlist: Failed to join 192.168.118.214: dial tcp 192.168.118.214:8301: i/o timeout
    2020-08-11T16:00:17.894+0530 [WARN]  agent.client.manager: No servers available
    2020-08-11T16:00:17.894+0530 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    2020-08-11T16:00:22.396+0530 [DEBUG] agent.client.memberlist.lan: memberlist: Failed to join 192.168.227.91: dial tcp 192.168.227.91:8301: i/o timeout
    2020-08-11T16:00:25.233+0530 [DEBUG] agent: Skipping coordinate updates until servers are upgraded
    2020-08-11T16:00:32.397+0530 [DEBUG] agent.client.memberlist.lan: memberlist: Failed to join 192.168.108.161: dial tcp 192.168.108.161:8301: i/o timeout
    2020-08-11T16:00:32.397+0530 [WARN]  agent: (LAN) couldn't join: number_of_nodes=0 error="3 errors occurred:
        * Failed to join 192.168.118.214: dial tcp 192.168.118.214:8301: i/o timeout
        * Failed to join 192.168.227.91: dial tcp 192.168.227.91:8301: i/o timeout
        * Failed to join 192.168.108.161: dial tcp 192.168.108.161:8301: i/o timeout ```




am I missing something here or any other configuration needs to be done?

Hey @banicr,

Could you check your network connectivity between the pod network on Kubernetes and the VM?

Hi @ishustava,

Network connectivity works fine from pod network to VM,

This I am running from inside the pod:

/ # ping 10.0.20.103
PING 10.0.20.103 (10.0.20.103) 56(84) bytes of data.
64 bytes from 10.0.20.103: icmp_seq=1 ttl=63 time=1.64 ms
64 bytes from 10.0.20.103: icmp_seq=2 ttl=63 time=1.87 ms
64 bytes from 10.0.20.103: icmp_seq=3 ttl=63 time=0.970 ms
^C
— 10.0.20.103 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2089ms
rtt min/avg/max/mdev = 0.970/1.493/1.871/0.381 ms

Hi,
Can you check the connectivity from the VM to the Kubernetes server pods on port 8301?
These errors:

dial tcp 192.168.227.91:8301: i/o timeout

Lead me to think that the VM can’t talk to the server pods.

Hi @lkysow,

From VM, Kubernetes consul pod is not reachable, this is what I have been trying to find out how to get this working.

I have configured NodePort as service type for consul server pods in Kubernetes, So the consul server pod should be accessible from the VM, But it is not happening.

It gives below Error output for the given command:

consul agent -retry-join ‘provider=k8s label_selector=“app=consul,component=server”’ -data-dir=/var/lib/consul/ -advertise=10.0.20.103

==> Starting Consul agent…
Version: ‘1.8.2’
Node ID: ‘e4f53ac0-96df-0d37-5fb3-4c21ddc896ff’
Node name: ‘k8workerg1w1’
Datacenter: ‘dc1’ (Segment: ‘’)
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 10.0.20.103 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

2020-08-14T22:54:53.508+0530 [INFO]  agent.client.serf.lan: serf: EventMemberJoin: k8workerg1w1 10.0.20.103
2020-08-14T22:54:53.509+0530 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
2020-08-14T22:54:53.509+0530 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
2020-08-14T22:54:53.510+0530 [INFO]  agent: Started HTTP server: address=127.0.0.1:8500 network=tcp
2020-08-14T22:54:53.510+0530 [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2020-08-14T22:54:53.510+0530 [INFO]  agent: Joining cluster...: cluster=LAN
2020-08-14T22:54:53.510+0530 [WARN]  agent.client.manager: No servers available
2020-08-14T22:54:53.510+0530 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-08-14T22:54:53.510+0530 [INFO]  agent: started state syncer

==> Consul agent running!
2020-08-14T22:54:53.556+0530 [INFO] agent: Discovered servers: cluster=LAN cluster=LAN servers=“192.168.118.214 192.168.227.91 192.168.108.161”
2020-08-14T22:54:53.556+0530 [INFO] agent: (LAN) joining: lan_addresses=[192.168.118.214, 192.168.227.91, 192.168.108.161]
2020-08-14T22:55:11.024+0530 [WARN] agent.client.manager: No servers available
2020-08-14T22:55:11.025+0530 [ERROR] agent.anti_entropy: failed to sync remote state: error=“No known Consul servers”
2020-08-14T22:55:23.559+0530 [WARN] agent: (LAN) couldn’t join: number_of_nodes=0 error="3 errors occurred:
* Failed to join 192.168.118.214: dial tcp 192.168.118.214:8301: i/o timeout
* Failed to join 192.168.227.91: dial tcp 192.168.227.91:8301: i/o timeout
* Failed to join 192.168.108.161: dial tcp 192.168.108.161:8301: i/o timeout

In the last part of the output you can see that, the VM is trying to join the consul server pods using the pod IP (which will not happen as Kubernetes pod IP’s are not accessible from outside the cluster) but, Since I have configured the service type as NodePort, it should Try to join the consul server pods using the using the kubernetes node IP on which the consul server pods are running.

Please find the below service configuration for consul in Kubernetes:

[root@k8masterg2m1 autoinstall]# kubectl get svc | grep consul
consul ExternalName consul.service.consul 8d
consul-consul-dns ClusterIP 10.97.208.207 53/TCP,53/UDP 7d
consul-consul-server NodePort 10.100.75.165 8501:31682/TCP,8301:31008/TCP,8301:30114/UDP,8302:30430/TCP,8302:32339/UDP,8300:31541/TCP,8600:32030/TCP,8600:31788/UDP 7d
consul-consul-ui LoadBalancer 10.105.177.156 443:31152/TCP 7d

So this retry-join ‘provider=k8s label_selector=“app=consul,component=server”’ will always use the pod ips. It won’t use the node ips. For that to return the node ips you need to run the server pods in hostNetwork mode so that their pod ip is actually the node IP.

Or you can use a DNS entry instead that resolves to each server IP.

It’s important to note that you can’t use a NodePort service because that causes the requests to be randomly load balanced across all the servers. Each consul client needs to be able to talk directly to a specific server. So you’d need to expose the servers using a hostPort, not a service. Then when the client talks to server1-ip:<port> it doesn’t have its request randomly load balanced to another server pod, that IP/port always routes to server1.

hi @lkysow,

In the helm chart it is mentioned like:

hostNetwork defines whether or not we use host networking instead of hostPort in the event that a CNI plugin doesn’t support hostPort. This has security implications and is not recommended. As doing so gives the consul client unnecessary access to all network traffic on the host.

But, still I tried enabling the hostNetwork for both consul-clients and consul-servers in which consul-client pods are coming up with the Node IP but consul-servers pods are not taking Node IP.

Hi,
I m new to Consul and got struck with the same issue. I tried all the options provided in https://github.com/hashicorp/consul-helm/issues/358. But no luck.
Basically I m trying to connect a client running in Azure VM to Consul cluster running in AKS. Both VM and AKS are in same V-net and different subnet.
Any pointers would help.

Thanks
Anand.