Issue with routing cross consul dc between VM and Kubernetes

Trying to use mesh gateways to create communication between VM(dc1) and Kubernetes(dc2) with VM cluster as primary.
Have a set of services(web → api) in each cluster and they connect seamlessly within their respective cluster. MeshGateway mode is set to local.
Also cross dc connectivity works with Kubernetes → VM via mesh gateways. Able to connect services(web[dc2] → api[dc1]).
Facing issue with connectivity from VM → Kubernetes, web[dc1] → api[dc2].

There are the below logs for the web[dc1] envoy:

[2021-11-11 20:13:47.466][7328][debug][conn_handler] [source/server/] [C137] new connection
[2021-11-11 20:13:47.467][7328][debug][http] [source/common/http/] [C137] new stream
[2021-11-11 20:13:47.467][7328][debug][http] [source/common/http/] [C137][S1871046600756104503] request headers complete (end_stream=true):
':authority', ''
':path', '/api'
':method', 'GET'
'user-agent', 'python-requests/2.18.4'
'accept-encoding', 'gzip, deflate'
'accept', '*/*'
'connection', 'keep-alive'

[2021-11-11 20:13:47.467][7328][debug][http] [source/common/http/] [C137][S1871046600756104503] request end stream
[2021-11-11 20:13:47.467][7328][debug][router] [source/common/router/] [C137][S1871046600756104503] cluster 'api.default.dc2.internal.8ad5cfa0-1476-b078-1401-0e593a059539.consul' match for U
RL '/api'
[2021-11-11 20:13:47.467][7328][debug][upstream] [source/common/upstream/] no healthy host for HTTP connection pool
[2021-11-11 20:13:47.467][7328][debug][http] [source/common/http/] [C137][S1871046600756104503] Sending local reply with details no_healthy_upstream
[2021-11-11 20:13:47.467][7328][debug][http] [source/common/http/] [C137][S1871046600756104503] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '19'
'content-type', 'text/plain'
'date', 'Thu, 11 Nov 2021 20:13:47 GMT'
'server', 'envoy'

[2021-11-11 20:13:47.467][7328][debug][connection] [source/common/network/] [C137] remote close

Able to discover both the API service and its associated sidecar proxy service for both dc’s from each dc. Consul UI also works fine for both dc’s. Api service on dc2 is also healthy and passing all health checks.

Added ACL as part of the POC, but now it gives same error from Kubernetes to VM traffic as well (web[dc2] → api[dc1]).

[2021-11-13 07:36:22.653][23][debug][http] [source/common/http/] [C833][S14467388687616257858] request end stream
[2021-11-13 07:36:22.654][23][debug][router] [source/common/router/] [C833][S14467388687616257858] cluster 'api.default.dc1.internal.a0d28293-0f1f-fa49-9977-4dd2e615aa42.consul' match for URL '/api'
[2021-11-13 07:36:22.654][23][debug][upstream] [source/common/upstream/] no healthy host for HTTP connection pool
[2021-11-13 07:36:22.654][23][debug][http] [source/common/http/] [C833][S14467388687616257858] Sending local reply with details no_healthy_upstream
[2021-11-13 07:36:22.654][23][debug][http] [source/common/http/] [C833][S14467388687616257858] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '19'
'content-type', 'text/plain'
'date', 'Sat, 13 Nov 2021 07:36:22 GMT'
'server', 'envoy'

Have configured service-defaults for both web and api services with Mesh Gateway mode as local. Don’t see any other errors apart from this.

Hi, it looks like the local envoys think that there are no healthy upstream instances of the service.

Hi @lkysow

Yes but not sure why as the services are all healthy and can see the same via:

  • cross dc HTTP API discovery calls
  • discover able in the mesh by other local services
  • visible as healthy in consul ui
  • When you make that cross-dc call what shows up in the logs of Envoy?
  • Can we see the output for localhost:19000/clusters?format=json for the Envoy web[dc2]?

@lkysow by cross dc calls i meant http service discovery calls i.e http://local-dc2-ui-ip:8500/v1/health/service/api?token=token-for-web-service&dc=dc1&passing=true. Both api and api-sidecar-proxy are discoverable.

However output of localhost:19000/clusters?format=json for envoy web[dc2] shows unhealthy for api[dc1]. Not sure why. Attached are the logs.
cluster_dc2.txt (6.7 KB)

Also the address for api[dc1] in localhost:19000/clusters?format=json seems to be of the local mesh gateway pod ip. No logs there. I tried changing the Mesh Gateway mode to remote but the address still remains to be of local.

P.S: Running this with the VM dc(dc1) on Vagrant and Kubernetes dc(dc2) on Minikube. Before enabling ACLs i was able to connect from dc2 --> dc1 whereas dc1--> dc2 was giving unhealthy error. Post ACLs(dc1 master) both sides of traffic fails with unhealthy error(status code 503). Local dc traffic within the mesh works fine.

Hey, I think it will be easiest to set up a call. Please DM me: