Unable to get mesh traffic to flow between two peered DCs with HCP Consul and K8s

Using HCP Consul i have two DataCenters (Azure East and Azure West), traffice internal to the DC’s flows fine and I have the demo robot shop with about 8 services running fine. I am running Consul on Kubernetes connected to HCP as given in your GitHub learn-consul-get-started-kubernetes. I have two kubernetes clusters connected to the two HCP consul datacenters.

I have created a mesh gateway between the two DCs and I THINK it’s working.

For test I have killed the web service in East and am trying to get service mesh to use the one in west.

On the US east cluster I see:

However I am unable to get any traffic to flow:

I have tried:


While it shows up on the Consul UI as a service running remotely it doesn’t appear in the service API catalog.

I have also tried explicit annotations on the pod:

  annotations:
    # CONSUL
    'consul.hashicorp.com/connect-inject': 'true'
    #'consul.hashicorp.com/envoy-extra-args': '--log-level debug'  # doesn't work this is already set
    'consul.hashicorp.com/connect-service-upstreams': 'echo-1.svc.k8s-consul-federation-sbx-default.peer:3000'

Trying to get to that service results in error:
root@my-debug-container-74b66b994c-fs99t:/# curl http://127.0.0.1:3000
upstream connect error or disconnect/reset before headers. reset reason: connection failureroot@my-debug-container-74b66b994c-fs99t:/#

The gateway services is running: but doesn’t output any logs (like not a single line today):

➜ ~ kubectl --context aks-k8s-consul-fed-sbx-eus2-app1 get pods -n consul
NAME READY STATUS RESTARTS AGE
api-gateway-74c65b4f88-srdxf 1/1 Running 0 2d20h
consul-connect-injector-65bc5d97d-ktcqm 1/1 Running 0 2d21h
consul-mesh-gateway-579ffb95b7-656ns 1/1 Running 0 23h
consul-webhook-cert-manager-cb8546597-7ff7v 1/1 Running 0 3d

I can query the ‘remote’ service using the health API:
{{CONSUL_HTTP_ADDR}}/v1/health/service/web?peer=k8s-consul-federation-sbx-default
(works and finds the service)

This doesn’t work , no service found and why I think the “consul connect proxy” doesn’t work
{{CONSUL_HTTP_ADDR}}/v1/catalog/service/web.

I’m pretty fustrated, as I have a fair amount of consul experience, but I’ve spent the better part of a week on this and haven’t got a single packet across.
What debug should I do? How do I verify the mesh gateways are configured correct (I just enabled them in the Helm charts)

Hello, so can you share what version of consul and consul-k8s you’re running as well as your consul values file? In addition have you exported the services to the other cluster using something like cluster peering? I’m going to add two links for some relevant docs that I think should help with getting the peering setup if you haven’t done that already.

I was able to get it working a little bit further. The DC are peered using the UI in HCP consul. The service is exported, and service-intentions given. The probably was that I created DC1 and DC2 using “learn consul get started” repo that applied an NSG firewall to any traffic coming into the cluster. I had modified a 2nd NSG that was controlled by AKS but missed the one that terraform added.

I still don’t have a working solution though:

I have added the annotations to my pod:

    'consul.hashicorp.com/connect-service-upstreams': 
    'echo-1.svc.k8s-consul-federation-sbx-default.peer:3000,
         web.svc.k8s-consul-federation-sbx-default.peer:8080'

and I can: curl http://127.0.0.1:3000 , but DNS is broke, and. most of the information I have found doesn’t address how to get DNS to work against HCP consul. I modified the helm chart to enable the DNS service, (not part of learn consul get started, nor the hashicups federation demo).
I now have a DNS service running, but the service doesn’t have anything to connect to and doesn’t work.

➜  sfa-ui git:(develop) ✗ kubectl get pods -l app=consul -n consul
NAME                                          READY   STATUS    RESTARTS   AGE
consul-connect-injector-65bc5d97d-ktcqm       1/1     Running   0          8d
consul-mesh-gateway-579ffb95b7-656ns          1/1     Running   0          6d22h
consul-webhook-cert-manager-cb8546597-7ff7v   1/1     Running   0          8d
➜  sfa-ui git:(develop) ✗ kubectl get pods -l hasDNS=true -n consul
No resources found in consul namespace.
➜  sfa-ui git:(develop) ✗ kubectl get svc -n consul
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)          AGE
api-gateway               LoadBalancer   10.30.122.245   20.22.151.216   8443:31683/TCP   8d
consul-connect-injector   ClusterIP      10.30.78.24     <none>          443/TCP          8d
consul-dns                ClusterIP      10.30.250.55    <none>          53/TCP,53/UDP    8d
consul-mesh-gateway       LoadBalancer   10.30.120.145   20.94.5.10      443:31012/TCP    6d22h
➜  sfa-ui git:(develop) ✗ kubectl get pods -n consul
NAME                                          READY   STATUS    RESTARTS   AGE
api-gateway-74c65b4f88-srdxf                  1/1     Running   0          8d
consul-connect-injector-65bc5d97d-ktcqm       1/1     Running   0          8d
consul-mesh-gateway-579ffb95b7-656ns          1/1     Running   0          6d22h
consul-webhook-cert-manager-cb8546597-7ff7v   1/1     Running   0          8d
➜  sfa-ui git:(develop) ✗ kubectl exec -it my-debug-container-84b95f867f-f9x9r -- bash
Defaulted container "my-debug-container" out of: my-debug-container, consul-dataplane, consul-connect-inject-init (init)
root@my-debug-container-84b95f867f-f9x9r:/# nslookup 
> server 10.30.250.55
Default server: 10.30.250.55
Address: 10.30.250.55#53
> web.default.k8s-consul-federation-sbx-default.external.de789064-6e05-d6a5-b94c-e0243f74c1f2.consul
;; communications error to 10.30.250.55#53: connection refused
;; communications error to 10.30.250.55#53: connection refused
;; communications error to 10.30.250.55#53: connection refused
;; no servers could be reached

>

Question 1:
So I don’t expect people to 100% follow what all I’ve done: but Given that I am running against HCP Consul, and my client is K8s using the helm chart. What should I do to enable resolution of .consul domain names in my pods?

** Question 2**:
I have consul api gatway running and have an HTTPRoute entry to my web server. The HTTPRoute does not take a DC or Peer parameters. If I have Consul API gateway what are service-resolver or other entries needed for APIGateway in DC1 to call a service in DC2?

Thanks,
Steve

for Question 1 I have to do a little digging as I’m not 100% sure on how that works in HCP. For Question 2 you’ll need to export the service from DC2 to DC1 (docs on exporting services) then create a service resolver in DC1 for that exported service (docs on service resolver here) and finally you’ll want your HTTPRoute to reference the service with a kind as MeshService.

For an example you can see one of the tests in the consul-k8s repo which sets this up

is an example of the exported services config entry:

is an example of a service resolver to resolve the static-service exported from the second partition

Thanks for the help, but I wasn’t able to get this working. I had exported services, and Played with service resolver, but never got it to work. Coming back from thanksgiving the project is moving on and this got pushed out of plan as it was taking too long. We may revisit this, but my feeling is that we won’t mesh multiple regions/clouds because of complexity and it incurs higher hashi license levels. (Each region will expose a service with a load balancer and we will use cloud networking to handle it.)