Can't connect with k8s helm-deployed Consul and "out of cluster nodes"

Hi everyone,

I’ve been struggling for several days with trying to get “out of cluster” nodes working with a helm-deployed version of Consul on k8s. Judging by a couple of other related topics on the forum, I’m not alone… :slight_smile:

I’m deploying a GKE cluster and three GCE VMs using Terraform. You can find the code here: https://github.com/datawire/pro-ref-arch/tree/master/cloud-infrastructure/google-cloud-platform

I’m installing 0.8.1 of the consul-helm chart (with Consul v1.5.2) onto the Terraformed GKE cluster: https://github.com/hashicorp/consul-helm

I’ve tried following along with this tutorial, https://medium.com/hashicorp-engineering/introduction-to-hashicorp-consul-connect-with-kubernetes-d7393f798e9d and although this was helpful, I don’t think the instructions works with the current release of the helm chart?

I’m not using a “fully connected” network (as per the doc warning: https://www.consul.io/docs/platform/k8s/out-of-cluster-nodes.html), but even when switching to host_network=true I still can’t get the out-of-cluster node to join the Consul cluster.

I have Consul running successfully on an out-of-cluster VM within the same VPC/network as the k8s cluster, and I can see that Consul successfully connects to the k8s API, as the logs show a list of Consul server node IPs was received. However, I also see an i/o failure when Consul tries to connect to the k8s node via port 8301. I’ve ssh’ed into the k8s node instance and used socat to see if anything is listening on that port on the node, and I get no response (I can, however, use socat to see the Consul agent listening to port 8500 on the same node)

Any guidance, or ideally a HashiCorp “approved” walkthrough and sample code repo, would be very much appreciated :slight_smile: The task appears simple at first glance, but the devil very much appears to be in the details!

Many thanks,

Daniel

2 Likes

I was keen to bump this thread, after the US holidays and HashiConf EU. Has anyone got any experience/advice with this?

Hi,

TL;DR

  • Out of the box helm does not support accessing Consul Server nodes by Host IP, it is possible with manual configuration
  • Using Alias-IP on your GKE cluster and attaching the VMs to the same VPC bypasses all complexity.

At the moment the way that the Consul Server instances are deployed via the Helm charts is that they advertise the address of the server as the POD_IP.

Using host_network to join the server will work; however, the helm chart configures the advertise address for the server to the POD_IP.

          command:
            - "/bin/sh"
            - "-ec"
            - |
              CONSUL_FULLNAME="{{template "consul.fullname" . }}"

              exec /bin/consul agent \
                -advertise="${POD_IP}" \

When the agent tries to communicate with the server it will be trying to use list of nodes from member list not the join address. This list will be the servers advertise address which is going to be the POD_IP.

What you will need to do is to manually modify the helm chart server-statefulset.yaml to change the advertise address to be the HOST_IP.

One more complexity, by default a daemon set is configured to run a Consul Agent in client mode on each node client-daemonset. This again has the advertise address set to the POD_IP which would need to be changed to the HOST_IP.

Additionally for both servers and clients nodeSelectors would need to be configured to ensure that a client and a server do not get deployed to the same instance as both are now using host port. If running dedicated nodes for the Server is undesirable it is possible to change the default ports for the server so that they do not conflict with the client.

With all that said, there is a way to avoid this complexity using GKE. I think if you create a
VPC-Native cluster this uses Alias-IP making a Pod addressable directly by its IP inside the virtual network. If the VM is attached to the same VPC which the cluster is attached then you should be able to make this setup work using default values for the Helm chart. I have not tested this in GKE however Azure has a similar option and this works.

https://cloud.google.com/kubernetes-engine/docs/how-to/alias-ips