Say I have a K8s cluster with helm-installed Consul. What’s the best practice for connecting external Consul agents to the servers? More specifically:
How do I expose the in-cluster Consul servers via Helm?
Once I successfully exposed the Consul servers, how do I configure the agent to connect to it?
Hi @lkysow . Thanks for pointing me to the resource. I have followed the instructions there. What I’ve done:
Created an EKS cluster
Installed Consul on the cluster with helm.
I then created an EC2 instance and ran consul agent on it to try to connect to the k8s consul servers. When I run:
consul agent -retry-join 'provider=k8s label_selector="app=consul,component=server"' -log-level debug -data-dir /var/lib/consul
The agent seems to be able to connect to k8s and retrieve a list of IPs. But the IPs are private so the connection times out:
==> Starting Consul agent...
Version: 'v1.8.0'
Node ID: '24cab3a0-c4a4-aa16-31a5-49ea2b9ba764'
Node name: 'ip-172-31-15-62.us-west-2.compute.internal'
Datacenter: 'dc1' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 172.31.15.62 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2020-07-08T04:56:04.581Z [WARN] agent: Node name will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.: node_name=ip-172-31-15-62.us-west-2.compute.internal
2020-07-08T04:56:04.582Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: ip-172-31-15-62.us-west-2.compute.internal 172.31.15.62
2020-07-08T04:56:04.582Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=udp
2020-07-08T04:56:04.583Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=tcp
2020-07-08T04:56:04.583Z [INFO] agent: Started HTTP server: address=127.0.0.1:8500 network=tcp
2020-07-08T04:56:04.583Z [INFO] agent: started state syncer
==> Consul agent running!
2020-07-08T04:56:04.583Z [INFO] agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2020-07-08T04:56:04.583Z [INFO] agent: Joining cluster...: cluster=LAN
2020-07-08T04:56:04.583Z [DEBUG] agent: discover: Using provider "k8s": cluster=LAN
2020-07-08T04:56:04.596Z [WARN] agent.client.manager: No servers available
2020-07-08T04:56:04.596Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-07-08T04:56:05.347Z [INFO] agent: Discovered servers: cluster=LAN cluster=LAN servers="10.0.191.39 10.0.239.36"
2020-07-08T04:56:05.347Z [INFO] agent: (LAN) joining: lan_addresses=[10.0.191.39, 10.0.239.36]
2020-07-08T04:56:15.347Z [DEBUG] agent.client.memberlist.lan: memberlist: Failed to join 10.0.191.39: dial tcp 10.0.191.39:8301: i/o timeout
The consul agent on the EC2 instance must be able to route to the pods IPs.
nodes outside of Kubernetes joining a cluster running within Kubernetes must be able to communicate to pod IPs via the network.
yagehu
July 8, 2020, 10:57pm
5
So should I launch EC2 instance in the same VPC subnet as my EKS worker nodes?
Is it possible to write a customized Helm chart to install Consul such that the gossip ports can be exposed via a hostPort
?
lkysow
July 10, 2020, 1:54am
6
It needs to be in the same VPC or a peered VPC.
Yes it’s possible. The clients already support that option but you also need to expose the servers and run them on different nodes than clients so the ports don’t clash.
banicr
September 8, 2020, 11:41am
7
hi @lkysow ,
What if I want to run my Kubernetes cluster and VM on my on-premise environment.
I tried giving the same pod_subnet_cidr on which my kubernetes hosts are running , still I am unable to join clients from outside kubernetes.
Please find the error logs below
==> Starting Consul agent...
Version: '1.8.3'
Node ID: 'ec41f608-3aeb-faf5-85aa-3db744fc5b8a'
Node name: 'k8workerg1w2'
Datacenter: 'dc1' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 10.225.20.104 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2020-09-08T17:02:32.394+0530 [INFO] agent.client.serf.lan: serf: EventMemberJoin: k8workerg1w2 10.225.20.104
2020-09-08T17:02:32.395+0530 [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=tcp
2020-09-08T17:02:32.395+0530 [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=udp
2020-09-08T17:02:32.395+0530 [INFO] agent: Started HTTP server: address=127.0.0.1:8500 network=tcp
2020-09-08T17:02:32.396+0530 [INFO] agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2020-09-08T17:02:32.396+0530 [INFO] agent: Joining cluster...: cluster=LAN
2020-09-08T17:02:32.396+0530 [WARN] agent.client.manager: No servers available
2020-09-08T17:02:32.396+0530 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-09-08T17:02:32.396+0530 [INFO] agent: started state syncer
==> Consul agent running!
2020-09-08T17:02:32.430+0530 [INFO] agent: Discovered servers: cluster=LAN cluster=LAN servers=10.255.20.197
2020-09-08T17:02:32.430+0530 [INFO] agent: (LAN) joining: lan_addresses=[10.255.20.197]
2020-09-08T17:02:42.430+0530 [WARN] agent: (LAN) couldn't join: number_of_nodes=0 error="1 error occurred:
* Failed to join 10.255.20.197: dial tcp 10.255.20.197:8301: i/o timeout
"
2020-09-08T17:02:42.430+0530 [WARN] agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error=<nil>
2020-09-08T17:02:48.308+0530 [WARN] agent.client.manager: No servers available
2020-09-08T17:02:48.308+0530 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-09-08T17:03:04.373+0530 [WARN] agent.client.manager: No servers available
2020-09-08T17:03:04.373+0530 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2020-09-08T17:03:12.438+0530 [INFO] agent: Discovered servers: cluster=LAN cluster=LAN servers=10.255.20.197
2020-09-08T17:03:12.438+0530 [INFO] agent: (LAN) joining: lan_addresses=[10.255.20.197]
2020-09-08T17:03:22.439+0530 [WARN] agent: (LAN) couldn't join: number_of_nodes=0 error="1 error occurred:
* Failed to join 10.255.20.197: dial tcp 10.255.20.197:8301: i/o timeout
If you can please provide the steps on how it is achieved would be a great help.
Thanks in advance