WAN Federation Mesh Gateway could not resolve host issue

I configured WAN Federation with Mesh Gateway on kubernetes following this Hashicorp link .

I was able to connect from serviceA in dc1 to serviceB in dc2 defining upstream annotation on serviceA:

'consul.hashicorp.com/connect-service-upstreams': **'serviceB.svc.dc2.dc**:12345'

and using localhost:12345 (sidecar proxy).

But I can’t make it to work with “transparent proxy” commenting upstream configuration and using ServiceResolver (e.g. failover).

I tried lot of serviceB dns names but all returns:

curl: (6) Could not resolve host: serviceB
command terminated with exit code 6

As Cross-Datacenter dns resolution (Consul DNS lookups across WAN-federated datacenters) is not working as stated here.

A sample objective is to deploy serviceA in dc1 and deploy serviceB in dc1 and dc2, allowing serviceA to use serviceB in dc1 with failover (ServiceResolver) to serviceB in dc2 if ServiceB in dc1 is down ?

I found a workaround, setting specific upstream for each datacenter:

For dc1:
ServiceA with this upstream:
consul.hashicorp.com/connect-service-upstreams’: ‘serviceB.svc.dc1.dc:12345’
ServiceB

For dc2:
ServiceA with this upstream:
consul.hashicorp.com/connect-service-upstreams’: ‘serviceB.svc.dc2.dc:12345’
ServiceB

and defining a ServiceRecolver like this:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceResolver
metadata:
  name: serviceb
  namespace: consul
spec:
  connectTimeout: 10s
  failover:
    '*':
      targets:
      - datacenter: "dc1"
      - datacenter: "dc2"

and executing in this 2 serviceA the command curl “http://localhost:12345”:

serviceA in dc1 will point by default to serviceB in dc1
serviceA in dc2 will point by default to serviceB in dc2.

If serviceB in one of the dc goes down, serviceA in that dc will automatically failover to serviceB in the other dc. This is ok !

But if I remove upstreams and try to resolve serviceB using “Transparent Proxy” (pointing to: “serviceB.virtual.consul”) i get intermittent resolution:

  • Right response from serviceB (also respecting cross-dc ServiceResolver rule)

  • curl: (6) Could not resolve host: serviceB
    command terminated with exit code 6

What I’m missing ?

Hi @Roxyrob,

The error you are getting seems to be due to name resolution issues. Are you using alpine container for serviceA? Could you switch to a non-alpine container and see if you still have intermittent resolution issues?

Some versions of Alpine have issues with DNS. docker-alpine/docs/caveats.md at master · gliderlabs/docker-alpine · GitHub

This seems to have been fixed in later releases. But it is better to try with a non-alpine image to rule out whether it is causing the issue.

Hi @Ranjandas,
you’re right about Alpine. I use a new image so search and TCP dns query are supported.

I done many test about alpine/musl parrallel nameserver that cannot be disabled and that I initially thought to be the issue but, also glibc uses parallel by default without this resolution issue.

Finally I was able to simulate and reproduce the issue using “options rotate” in glibc:

/etc/resolv.conf configuration with consul for k8s (helm chart) with dns enabled and dns.enableRedirection enable (dns forwording/redirection):

search  static-client.svc.cluster.local svc.cluster.local cluster.local eu-south-1.compute.internal
nameserver 127.0.0.1
nameserver 172.20.0.10
options ndots:5 rotate

running e.g.

  watch -n 0.1 curl -sS static-server.service.consul

will show the intermittent resolution issue. Then removing rotate the issue disappears (also if glibc seems to use parallel queries to namserver by default).

That said finally I opted for a clean coredns setup with kubernetes standard dns local resolver config (only kubernetes coredns service as nameserver):

search consul-mesh.svc.cluster.local svc.cluster.local cluster.local ...
nameserver 172.20.0.10
options ndots:5

and setting stub domain for consul in coredns.

Not a super-k8s-native solution (as coredns does not support referncing consul dns “service name” instead of static consul dns “service ip”) but avoiding many issues based on differences and behaviors of different linux resolvers logic.