Issue with Consul agent as a client

Hi,

I am trying to use Consul for service discovery in my bare metal Kubernetes clusters.

I have setup a Consul cluster on one of my Kubernetes cluster and the Consul server is running there.

To expose its components to the outside world, I had set all the Consul services to be of nodePort type.

It looks like below :
hashicorp-consul-server NodePort 10.110.93.226 <none> 8500:31845/TCP,8301:30436/TCP,8301:30436/UDP,8302:30649/TCP,8302:30649/UDP,8300:31716/TCP,8600:32563/TCP,8600:32563/UDP 8d

Now on the second Kubernetes cluster, I don’t want to install Consul via all that Helm process again because it is so painful on a bare metal Kubernetes machine.

I want to just run the consul executable and connect to the Consul server which I mentioned above.

I created a config json file like so :

{
    "server": false,
    "datacenter": "dc1",
    "data_dir": "/home/docker/abhinav/consul",
    "log_level": "INFO",
    "enable_syslog": true,
    "leave_on_terminate": true,
    "bind_addr": "10.245.101.31",
    "ports": {
        "server": 31716
    },
    "start_join": [
        "ilcepoc0928:30436"
    ]
}

But the client keeps on connecting to the 8300 port which obviously is not exposed at all. I want it to use the nodePort of the Consul server’s K8s service. I even specified the nodePort 31716 in the ports section but still no luck. Can someone please help me out here ?

./consul agent -config-file consul-client.json

==> Starting Consul agent...
           Version: 'v1.7.2'
           Node ID: 'aad6d03e-f3ec-b3fb-658b-5f3559c987ed'
         Node name: 'ilcepoc0918'
        Datacenter: 'dc1' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
      Cluster Addr: 10.245.101.31 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2020-04-25T01:37:55.855+0300 [INFO]  agent.client.serf.lan: serf: EventMemberJoin: ilcepoc0918 10.245.101.31
    2020-04-25T01:37:55.856+0300 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
    2020-04-25T01:37:55.856+0300 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
    2020-04-25T01:37:55.858+0300 [INFO]  agent: Started HTTP server: address=127.0.0.1:8500 network=tcp
==> Joining cluster...
    2020-04-25T01:37:55.858+0300 [INFO]  agent: (LAN) joining: lan_addresses=[ilcepoc0928:30436]
    2020-04-25T01:37:55.863+0300 [INFO]  agent.client.serf.lan: serf: EventMemberJoin: ilcepoc0928 10.32.0.15
    2020-04-25T01:37:55.863+0300 [INFO]  agent.client.serf.lan: serf: EventMemberJoin: hashicorp-consul-server-0 10.32.0.13
    2020-04-25T01:37:55.863+0300 [INFO]  agent: (LAN) joined: number_of_nodes=1
    2020-04-25T01:37:55.863+0300 [INFO]  agent: Join completed. Initial agents synced with: agent_count=1
    2020-04-25T01:37:55.863+0300 [INFO]  agent: started state syncer
==> Consul agent running!
    2020-04-25T01:37:55.863+0300 [INFO]  agent.client: adding server: server="hashicorp-consul-server-0 (Addr: tcp/10.32.0.13:8300) (DC: dc1)"
    2020-04-25T01:37:55.864+0300 [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.32.0.13:8300 error="rpc error getting client: failed to get conn: dial tcp 10.245.101.31:0->10.32.0.13:8300: connect: connection refused"
    2020-04-25T01:37:55.864+0300 [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error getting client: failed to get conn: dial tcp 10.245.101.31:0->10.32.0.13:8300: connect: connection refused"
    2020-04-25T01:37:57.856+0300 [INFO]  agent.client.memberlist.lan: memberlist: Suspect hashicorp-consul-server-0 has failed, no acks received
    2020-04-25T01:37:58.787+0300 [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.32.0.13:8300 error="rpc error getting client: failed to get conn: dial tcp 10.245.101.31:0->10.32.0.13:8300: connect: connection refused"
    2020-04-25T01:37:58.787+0300 [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error getting client: failed to get conn: dial tcp 10.245.101.31:0->10.32.0.13:8300: connect: connection refused"
    2020-04-25T01:37:59.857+0300 [INFO]  agent.client.memberlist.lan: memberlist: Suspect ilcepoc0928 has failed, no acks received
    2020-04-25T01:38:01.857+0300 [INFO]  agent.client.memberlist.lan: memberlist: Marking hashicorp-consul-server-0 as failed, suspect timeout reached (0 peer confirmations)

Can somebody answer please ?

It’s weekend, stay calm. :slight_smile:

Weekend is long gone :slight_smile: I need to setup service discovery mechanism really urgently (have project deadlines and stuff). I hope someone helps me get going soon.

Hi @abhinav, the initial join works on the node port because you’ve set it in your start_join config (fyi you should use retry_join instead, see https://www.consul.io/docs/agent/options.html#start_join). However after the initial connection, consul will send the list of all the members in the cluster. At this point, each member will send its IP and port that was configured via advertise_addr and ports { serf_lan = <..>}.

Note that every consul client needs to be able to connect with every other Consul client and server.

In your case, currently the clients and servers are advertising their Pod IPs and the default ports (not the NodePorts). Instead, you’ll need to set https://www.consul.io/docs/platform/k8s/helm.html#v-client-exposegossipports, e.g.

client:
  exposeGossipPorts: true

So that the client IPs are node ips and the client ports are exposed as hostPorts.

For the servers, we don’t actually support them being exposed via hostPorts. You can try and use the code in this PR: https://github.com/hashicorp/consul-helm/pull/332 and also make the change mentioned in https://github.com/hashicorp/consul-helm/pull/332#pullrequestreview-372236074 to set -advertise=${NODE_IP}.