Getting started with mesh gateway issues

Hi,

I’m trying to get started with consul and gateway meshes. I just want to test out how the gateway works between a k8s cluster and vm cluster. I’ve tried to follow these articles:

I’m just testing on my local pc, deploying to a minikube instance and just setting up consul locally, and trying to connect the two.

So I’m using these helm values to deploy to minikube:

global:
  name: consul
  datacenter: k8s-primary

  tls:
    enabled: true

  federation:
    enabled: true
    createFederationSecret: true

  acls:
    manageSystemACLs: false
    createReplicationToken: false

ui:
  service:
    type: 'NodePort'

connectInject:
  enabled: true

meshGateway:
  enabled: true
  replicas: 1
  service:
    type: NodePort
    nodePort: 30085
  wanAddress:
    source: Service
  affinity: null

server:
  affinity: null

And to run consul locally I’m running

consul agent -config-file consul.hcl

With the contents of consul.hcl being:

cert_file = "/home/zane/IdeaProjects/consul/vm-secondary-server-consul-0.pem"
key_file = "/home/zane/IdeaProjects/consul/vm-secondary-server-consul-0-key.pem"
ca_file = "/home/zane/IdeaProjects/consul/consul-agent-ca.pem"
primary_gateways = ["192.168.99.104:30085"]

server = true
datacenter = "vm-secondary"
data_dir = "/home/zane/IdeaProjects/consul/data"
enable_central_service_config = true
primary_datacenter = "k8s-primary"
connect {
  enabled = true
  enable_mesh_gateway_wan_federation = true
}
verify_incoming_rpc = true
verify_outgoing = true
verify_server_hostname = true
ports {
  https = 8501
  http = -1
  grpc = 8502
}

bind_addr = "192.168.99.1"
bootstrap = false
bootstrap_expect = 1

With this config I’m getting “no acks received” errors. Here are logs from k8s server:

    2020-07-23T09:30:21.487Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect zane-pc.vm-secondary has failed, no acks received
    2020-07-23T09:30:23.686Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send indirect ping: read tcp 172.17.0.7:36064->172.17.0.5:8443: read: connection reset by peer from=172.17.0.9:8302
    2020-07-23T09:30:26.487Z [INFO]  agent.server.memberlist.wan: memberlist: Marking zane-pc.vm-secondary as failed, suspect timeout reached (0 peer confirmations)
    2020-07-23T09:30:26.487Z [INFO]  agent.server.serf.wan: serf: EventMemberFailed: zane-pc.vm-secondary 192.168.99.1
    2020-07-23T09:30:26.487Z [INFO]  agent.server: Handled event for server in area: event=member-failed server=zane-pc.vm-secondary area=wan
    2020-07-23T09:30:26.988Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 192.168.99.1:8302: read tcp 172.17.0.7:36102->172.17.0.5:8443: read: connection reset by peer
    2020-07-23T09:30:39.082Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=192.168.99.1:8300 datacenter=vm-secondary method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 172.17.0.7:36218->172.17.0.5:8443: read: connection reset by peer"
    2020-07-23T09:30:48.827Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: zane-pc.vm-secondary 192.168.99.1
    2020-07-23T09:30:48.827Z [INFO]  agent.server: Handled event for server in area: event=member-join server=zane-pc.vm-secondary area=wan
    2020-07-23T09:30:48.991Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 192.168.99.1:8302: read tcp 172.17.0.7:36324->172.17.0.5:8443: read: connection reset by peer
    2020-07-23T09:30:49.488Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 192.168.99.1:8302: read tcp 172.17.0.7:36330->172.17.0.5:8443: read: connection reset by peer
    2020-07-23T09:30:53.186Z [WARN]  agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: zane-pc.vm-secondary)
    2020-07-23T09:30:53.488Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 192.168.99.1:8302: read tcp 172.17.0.7:36366->172.17.0.5:8443: read: connection reset by peer
    2020-07-23T09:30:53.687Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send indirect ping: read tcp 172.17.0.7:36374->172.17.0.5:8443: read: connection reset by peer from=172.17.0.9:8302
    2020-07-23T09:30:56.489Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: read tcp 172.17.0.7:36406->172.17.0.5:8443: read: connection reset by peer
    2020-07-23T09:30:57.385Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send indirect ping: read tcp 172.17.0.7:36424->172.17.0.5:8443: read: connection reset by peer from=172.17.0.8:8302
    2020-07-23T09:30:58.989Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 192.168.99.1:8302: read tcp 172.17.0.7:36452->172.17.0.5:8443: read: connection reset by peer
    2020-07-23T09:30:59.488Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 192.168.99.1:8302: read tcp 172.17.0.7:36458->172.17.0.5:8443: read: connection reset by peer
    2020-07-23T09:31:06.487Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect zane-pc.vm-secondary has failed, no acks received
    2020-07-23T09:31:06.580Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=192.168.99.1:8300 datacenter=vm-secondary method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 172.17.0.7:36514->172.17.0.5:8443: read: connection reset by peer"
    2020-07-23T09:31:06.589Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=192.168.99.1:8300 datacenter=vm-secondary method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 172.17.0.7:36520->172.17.0.5:8443: read: connection reset by peer"
    2020-07-23T09:31:08.690Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send indirect ping: read tcp 172.17.0.7:36544->172.17.0.5:8443: read: connection reset by peer from=172.17.0.9:8302
    2020-07-23T09:31:12.385Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send indirect ping: read tcp 172.17.0.7:36582->172.17.0.5:8443: read: connection reset by peer from=172.17.0.8:8302
    2020-07-23T09:31:26.488Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send compound ping and suspect message to 192.168.99.1:8302: read tcp 172.17.0.7:36710->172.17.0.5:8443: read: connection reset by peer
    2020-07-23T09:31:28.687Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send indirect ping: read tcp 172.17.0.7:36744->172.17.0.5:8443: read: connection reset by peer from=172.17.0.9:8302
    2020-07-23T09:31:28.883Z [INFO]  agent.server.memberlist.wan: memberlist: Marking zane-pc.vm-secondary as failed, suspect timeout reached (0 peer confirmations)
    2020-07-23T09:31:28.883Z [INFO]  agent.server.serf.wan: serf: EventMemberFailed: zane-pc.vm-secondary 192.168.99.1
    2020-07-23T09:31:28.883Z [INFO]  agent.server: Handled event for server in area: event=member-failed server=zane-pc.vm-secondary area=wan
    2020-07-23T09:31:31.487Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect zane-pc.vm-secondary has failed, no acks received
    2020-07-23T09:31:40.629Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=192.168.99.1:8300 datacenter=vm-secondary method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 172.17.0.7:36848->172.17.0.5:8443: read: connection reset by peer"
    2020-07-23T09:31:49.302Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=192.168.99.1:8300 datacenter=vm-secondary method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 172.17.0.7:36938->172.17.0.5:8443: read: connection reset by peer"
    2020-07-23T09:31:56.502Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to zane-pc.vm-secondary 192.168.99.1:8302
    2020-07-23T09:32:20.472Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=192.168.99.1:8300 datacenter=vm-secondary method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 172.17.0.7:37206->172.17.0.5:8443: read: connection reset by peer"
    2020-07-23T09:32:51.619Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=192.168.99.1:8300 datacenter=vm-secondary method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 172.17.0.7:37480->172.17.0.5:8443: read: connection reset by peer"
    2020-07-23T09:32:52.090Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=192.168.99.1:8300 datacenter=vm-secondary method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 172.17.0.7:37490->172.17.0.5:8443: read: connection reset by peer"
    2020-07-23T09:33:00.535Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=192.168.99.1:8300 datacenter=vm-secondary method=Internal.ServiceDump error="rpc error getting client: failed to get conn: read tcp 172.17.0.7:37568->172.17.0.5:8443: read: connection reset by peer"

And some from the local server:

BootstrapExpect is set to 1; this is the same as Bootstrap mode.
bootstrap = true: do not enable unless necessary
==> Starting Consul agent...
           Version: 'v1.8.0'
           Node ID: 'ac4d0768-3b68-a8d5-0fe3-bb05c0822df1'
         Node name: 'zane-pc'
        Datacenter: 'vm-secondary' (Segment: '<all>')
            Server: true (Bootstrap: true)
       Client Addr: [127.0.0.1] (HTTP: -1, HTTPS: 8501, gRPC: 8502, DNS: 8600)
      Cluster Addr: 192.168.99.1 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: true, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2020-07-23T11:26:27.626+0200 [INFO]  agent.server.gateway_locator: will dial the primary datacenter through its mesh gateways
    2020-07-23T11:26:27.827+0200 [INFO]  agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:ac4d0768-3b68-a8d5-0fe3-bb05c0822df1 Address:192.168.99.1:8300}]"
    2020-07-23T11:26:27.827+0200 [INFO]  agent.server.raft: entering follower state: follower="Node at 192.168.99.1:8300 [Follower]" leader=
    2020-07-23T11:26:27.828+0200 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: zane-pc.vm-secondary 192.168.99.1
    2020-07-23T11:26:27.828+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: zane-pc 192.168.99.1
    2020-07-23T11:26:27.829+0200 [INFO]  agent.server: Adding LAN server: server="zane-pc (Addr: tcp/192.168.99.1:8300) (DC: vm-secondary)"
    2020-07-23T11:26:27.829+0200 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
    2020-07-23T11:26:27.829+0200 [INFO]  agent.server: Handled event for server in area: event=member-join server=zane-pc.vm-secondary area=wan
    2020-07-23T11:26:27.829+0200 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
    2020-07-23T11:26:27.830+0200 [INFO]  agent: Started HTTPS server: address=127.0.0.1:8501 network=tcp
    2020-07-23T11:26:27.830+0200 [INFO]  agent: Started gRPC server: address=127.0.0.1:8502 network=tcp
    2020-07-23T11:26:27.830+0200 [INFO]  agent: started state syncer
==> Consul agent running!
    2020-07-23T11:26:27.830+0200 [INFO]  agent: Refreshing mesh gateways is supported for the following discovery methods: discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-07-23T11:26:27.830+0200 [INFO]  agent: Refreshing mesh gateways...
    2020-07-23T11:26:27.830+0200 [INFO]  agent.server.gateway_locator: updated fallback list of primary mesh gateways: mesh_gateways=[192.168.99.104:30085]
    2020-07-23T11:26:27.830+0200 [INFO]  agent: Refreshing mesh gateways completed
    2020-07-23T11:26:27.830+0200 [INFO]  agent: Retry join is supported for the following discovery methods: cluster=WAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-07-23T11:26:27.830+0200 [INFO]  agent: Joining cluster...: cluster=WAN
    2020-07-23T11:26:27.830+0200 [INFO]  agent: (WAN) joining: wan_addresses=[*.k8s-primary/192.0.2.2]
    2020-07-23T11:26:27.881+0200 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-1.k8s-primary 172.17.0.8
    2020-07-23T11:26:27.881+0200 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-2.k8s-primary 172.17.0.9
    2020-07-23T11:26:27.881+0200 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.k8s-primary 172.17.0.7
    2020-07-23T11:26:27.881+0200 [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-1.k8s-primary area=wan
    2020-07-23T11:26:27.881+0200 [INFO]  agent: (WAN) joined: number_of_nodes=1
    2020-07-23T11:26:27.881+0200 [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=WAN num_agents=1
    2020-07-23T11:26:27.881+0200 [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-2.k8s-primary area=wan
    2020-07-23T11:26:27.881+0200 [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-0.k8s-primary area=wan
    2020-07-23T11:26:34.764+0200 [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
    2020-07-23T11:26:34.764+0200 [INFO]  agent.server.raft: entering candidate state: node="Node at 192.168.99.1:8300 [Candidate]" term=2
    2020-07-23T11:26:34.856+0200 [INFO]  agent.server.raft: election won: tally=1
    2020-07-23T11:26:34.857+0200 [INFO]  agent.server.raft: entering leader state: leader="Node at 192.168.99.1:8300 [Leader]"
    2020-07-23T11:26:34.857+0200 [INFO]  agent.server: cluster leadership acquired
    2020-07-23T11:26:34.857+0200 [INFO]  agent.server: New leader elected: payload=zane-pc
    2020-07-23T11:26:34.997+0200 [INFO]  agent: Synced node info
    2020-07-23T11:26:35.125+0200 [INFO]  agent.server.connect: received new intermediate certificate from primary datacenter
    2020-07-23T11:26:35.163+0200 [INFO]  agent.server.connect: updated root certificates from primary datacenter
    2020-07-23T11:26:35.163+0200 [INFO]  agent.server.connect: initialized secondary datacenter CA with provider: provider=consul
    2020-07-23T11:26:35.163+0200 [INFO]  agent.leader: started routine: routine="config entry replication"
    2020-07-23T11:26:35.163+0200 [INFO]  agent.leader: started routine: routine="federation state replication"
    2020-07-23T11:26:35.163+0200 [INFO]  agent.leader: started routine: routine="federation state anti-entropy"
    2020-07-23T11:26:35.163+0200 [INFO]  agent.leader: started routine: routine="secondary CA roots watch"
    2020-07-23T11:26:35.163+0200 [INFO]  agent.leader: started routine: routine="intention replication"
    2020-07-23T11:26:35.163+0200 [INFO]  agent.leader: started routine: routine="secondary cert renew watch"
    2020-07-23T11:26:35.163+0200 [INFO]  agent.leader: started routine: routine="CA root pruning"
    2020-07-23T11:26:35.163+0200 [INFO]  agent.server: member joined, marking health alive: member=zane-pc
    2020-07-23T11:26:35.164+0200 [INFO]  agent.server.gateway_locator: will dial the primary datacenter using our local mesh gateways if possible
    2020-07-23T11:26:35.168+0200 [INFO]  agent.server: federation state anti-entropy synced
    2020-07-23T11:26:35.244+0200 [INFO]  agent.server: federation state anti-entropy synced
    2020-07-23T11:26:37.828+0200 [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-0.k8s-primary has failed, no acks received
    2020-07-23T11:26:41.828+0200 [INFO]  agent.server.gateway_locator: new cached locations of mesh gateways: primary=[192.168.99.104:30085] local=[]
    2020-07-23T11:26:52.828+0200 [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-1.k8s-primary has failed, no acks received
    2020-07-23T11:27:07.828+0200 [INFO]  agent.server.memberlist.wan: memberlist: Marking consul-server-0.k8s-primary as failed, suspect timeout reached (0 peer confirmations)
    2020-07-23T11:27:07.828+0200 [INFO]  agent.server.serf.wan: serf: EventMemberFailed: consul-server-0.k8s-primary 172.17.0.7
    2020-07-23T11:27:07.828+0200 [INFO]  agent.server: Handled event for server in area: event=member-failed server=consul-server-0.k8s-primary area=wan
    2020-07-23T11:27:12.828+0200 [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-2.k8s-primary has failed, no acks received
    2020-07-23T11:27:22.828+0200 [INFO]  agent.server.memberlist.wan: memberlist: Marking consul-server-1.k8s-primary as failed, suspect timeout reached (0 peer confirmations)
    2020-07-23T11:27:22.828+0200 [INFO]  agent.server.serf.wan: serf: EventMemberFailed: consul-server-1.k8s-primary 172.17.0.8
    2020-07-23T11:27:22.829+0200 [INFO]  agent.server: Handled event for server in area: event=member-failed server=consul-server-1.k8s-primary area=wan
    2020-07-23T11:27:27.828+0200 [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-0.k8s-primary 172.17.0.7:8302
    2020-07-23T11:27:27.832+0200 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.k8s-primary 172.17.0.7
    2020-07-23T11:27:27.832+0200 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-1.k8s-primary 172.17.0.8
    2020-07-23T11:27:27.832+0200 [WARN]  agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: zane-pc.vm-secondary)
    2020-07-23T11:27:27.832+0200 [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-0.k8s-primary area=wan
    2020-07-23T11:27:27.832+0200 [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-1.k8s-primary area=wan
    2020-07-23T11:27:37.829+0200 [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-1.k8s-primary has failed, no acks received
    2020-07-23T11:27:48.657+0200 [WARN]  agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: zane-pc.vm-secondary)
    2020-07-23T11:28:12.829+0200 [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-2.k8s-primary has failed, no acks received
    2020-07-23T11:28:42.829+0200 [INFO]  agent.server.memberlist.wan: memberlist: Marking consul-server-2.k8s-primary as failed, suspect timeout reached (0 peer confirmations)
    2020-07-23T11:28:42.829+0200 [INFO]  agent.server.serf.wan: serf: EventMemberFailed: consul-server-2.k8s-primary 172.17.0.9
    2020-07-23T11:28:42.829+0200 [INFO]  agent.server: Handled event for server in area: event=member-failed server=consul-server-2.k8s-primary area=wan
    2020-07-23T11:28:48.661+0200 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-2.k8s-primary 172.17.0.9
    2020-07-23T11:28:48.661+0200 [WARN]  agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: zane-pc.vm-secondary)
    2020-07-23T11:28:48.661+0200 [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-2.k8s-primary area=wan
    2020-07-23T11:28:52.829+0200 [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-0.k8s-primary has failed, no acks received
    2020-07-23T11:29:22.829+0200 [INFO]  agent.server.memberlist.wan: memberlist: Marking consul-server-0.k8s-primary as failed, suspect timeout reached (0 peer confirmations)
    2020-07-23T11:29:22.829+0200 [INFO]  agent.server.serf.wan: serf: EventMemberFailed: consul-server-0.k8s-primary 172.17.0.7
    2020-07-23T11:29:22.829+0200 [INFO]  agent.server: Handled event for server in area: event=member-failed server=consul-server-0.k8s-primary area=wan
    2020-07-23T11:29:27.833+0200 [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-0.k8s-primary 172.17.0.7:8302
    2020-07-23T11:29:27.837+0200 [WARN]  agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: zane-pc.vm-secondary)
    2020-07-23T11:29:27.837+0200 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.k8s-primary 172.17.0.7
    2020-07-23T11:29:27.837+0200 [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-0.k8s-primary area=wan
    2020-07-23T11:29:32.829+0200 [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-2.k8s-primary has failed, no acks received
    2020-07-23T11:29:48.680+0200 [WARN]  agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: zane-pc.vm-secondary)
    2020-07-23T11:30:12.829+0200 [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-1.k8s-primary has failed, no acks received

Any help or direction here would be really appreciated.

4 Likes

I know this is an old topic, but I don’t think Consul cluster federation works at all with Helm deployments. Despite what the documentation tells you.

It’s been a few weeks and I still can’t get it to work.

@mister2d I’m really sorry to hear that. Please DM me on Twitter (https://twitter.com/lkysow) and I can set up a time to live debug.

I can also assure you that it does work (although I know that doesn’t help to know when it’s not working for you), but every night we run acceptance tests across every cloud to ensure that it is working.

@lkysow After tackling it some more today I was able to get it to work with this solution:

I hope it’s ideal.