Consul service mesh connectivity issues

Hi All,

We experience issues trying to have 2 services (prometheus -> alertmanager) communicate over consul service mesh.

Our setup:

Using the official consul helm chart I have consul wan federation setup over 2 consul clusters running on 2 separate kubernetes clusters, each in a different datacenter. Wan federation works. I also enabled consul connect / service mesh. This however does not work. I have a prometheus pod (with a consul-connect-envoy-sidecar) in the one datacenter and an alertmanager pod in the other datacenter. I used annotations to enable and configure connect for those pods. In the consul ui I can see the prometheus pod has the backend pod as an upstream, in consul ui all healthchecks are green, i.e. for the client & backend services, for the sidecar proxies of those services and for the meshgateways.

In the logs I can see that the connection request from prometheus towards the alertmanager reaches the meshgateway in the datacenter on the alertmanager side and from there on reaches the consul-connect-envoy-sidecar inside the alertmanager pod.

meshgateway

[2019-11-08 17:19:48.962][29][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:72] tls inspector: new connection accepted
[2019-11-08 17:19:48.962][29][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:118] tls:onServerName(), requestedServerName: alertmanager.default.datacenter-B-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul
[2019-11-08 17:19:48.962][29][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C70322] new tcp proxy session
[2019-11-08 17:19:48.962][29][debug][connection] [source/common/network/connection_impl.cc:101] [C70322] closing data_to_write=0 type=1
[2019-11-08 17:19:48.962][29][debug][connection] [source/common/network/connection_impl.cc:183] [C70322] closing socket: 1

consul-connect-envoy-sidecar

[2019-11-08 17:30:56.163][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:30:56.259][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C263736] new tcp proxy session
[2019-11-08 17:30:56.259][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C263736] Creating connection to cluster local_app
[2019-11-08 17:30:56.259][000012][debug][pool] [source/common/tcp/conn_pool.cc:80] creating a new connection
[2019-11-08 17:30:56.259][000012][debug][pool] [source/common/tcp/conn_pool.cc:371] [C263737] connecting
[2019-11-08 17:30:56.259][000012][debug][connection] [source/common/network/connection_impl.cc:634] [C263737] connecting to 127.0.0.1:9093
[2019-11-08 17:30:56.260][000012][debug][connection] [source/common/network/connection_impl.cc:643] [C263737] connection in progress
[2019-11-08 17:30:56.260][000012][debug][pool] [source/common/tcp/conn_pool.cc:106] queueing request due to no available connections
[2019-11-08 17:30:56.260][000012][debug][main] [source/server/connection_handler_impl.cc:257] [C263736] new connection
[2019-11-08 17:30:56.260][000012][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263736] handshake error: 2
[2019-11-08 17:30:56.261][000012][debug][connection] [source/common/network/connection_impl.cc:516] [C263737] connected
[2019-11-08 17:30:56.261][000012][debug][pool] [source/common/tcp/conn_pool.cc:292] [C263737] assigning connection
[2019-11-08 17:30:56.261][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:540] TCP:onUpstreamEvent(), requestedServerName:
[2019-11-08 17:30:56.261][000012][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263736] handshake error: 5
[2019-11-08 17:30:56.261][000012][debug][connection] [source/common/network/connection_impl.cc:183] [C263736] closing socket: 0
[2019-11-08 17:30:56.261][000012][debug][connection] [source/common/network/connection_impl.cc:101] [C263737] closing data_to_write=0 type=0
[2019-11-08 17:30:56.261][000012][debug][connection] [source/common/network/connection_impl.cc:183] [C263737] closing socket: 1
[2019-11-08 17:30:56.261][000012][debug][pool] [source/common/tcp/conn_pool.cc:121] [C263737] client disconnected
[2019-11-08 17:30:56.261][000012][debug][main] [source/server/connection_handler_impl.cc:68] [C263736] adding to cleanup list
[2019-11-08 17:30:56.262][000012][debug][pool] [source/common/tcp/conn_pool.cc:245] [C263737] connection destroyed
[2019-11-08 17:31:01.163][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:06.164][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:06.262][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C263738] new tcp proxy session
[2019-11-08 17:31:06.262][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C263738] Creating connection to cluster local_app
[2019-11-08 17:31:06.262][000012][debug][pool] [source/common/tcp/conn_pool.cc:80] creating a new connection
[2019-11-08 17:31:06.262][000012][debug][pool] [source/common/tcp/conn_pool.cc:371] [C263739] connecting
[2019-11-08 17:31:06.262][000012][debug][connection] [source/common/network/connection_impl.cc:634] [C263739] connecting to 127.0.0.1:9093
[2019-11-08 17:31:06.262][000012][debug][connection] [source/common/network/connection_impl.cc:643] [C263739] connection in progress
[2019-11-08 17:31:06.262][000012][debug][pool] [source/common/tcp/conn_pool.cc:106] queueing request due to no available connections
[2019-11-08 17:31:06.262][000012][debug][main] [source/server/connection_handler_impl.cc:257] [C263738] new connection
[2019-11-08 17:31:06.262][000012][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263738] handshake error: 2
[2019-11-08 17:31:06.262][000012][debug][connection] [source/common/network/connection_impl.cc:516] [C263739] connected
[2019-11-08 17:31:06.262][000012][debug][pool] [source/common/tcp/conn_pool.cc:292] [C263739] assigning connection
[2019-11-08 17:31:06.262][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:540] TCP:onUpstreamEvent(), requestedServerName:
[2019-11-08 17:31:06.262][000012][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263738] handshake error: 2
[2019-11-08 17:31:06.263][000012][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263738] handshake error: 5
[2019-11-08 17:31:06.263][000012][debug][connection] [source/common/network/connection_impl.cc:183] [C263738] closing socket: 0
[2019-11-08 17:31:06.263][000012][debug][connection] [source/common/network/connection_impl.cc:101] [C263739] closing data_to_write=0 type=0
[2019-11-08 17:31:06.263][000012][debug][connection] [source/common/network/connection_impl.cc:183] [C263739] closing socket: 1
[2019-11-08 17:31:06.263][000012][debug][pool] [source/common/tcp/conn_pool.cc:121] [C263739] client disconnected
[2019-11-08 17:31:06.263][000012][debug][main] [source/server/connection_handler_impl.cc:68] [C263738] adding to cleanup list
[2019-11-08 17:31:06.263][000012][debug][pool] [source/common/tcp/conn_pool.cc:245] [C263739] connection destroyed
[2019-11-08 17:31:11.165][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:16.165][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:16.264][000013][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C263740] new tcp proxy session
[2019-11-08 17:31:16.264][000013][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C263740] Creating connection to cluster local_app
[2019-11-08 17:31:16.264][000013][debug][pool] [source/common/tcp/conn_pool.cc:80] creating a new connection
[2019-11-08 17:31:16.264][000013][debug][pool] [source/common/tcp/conn_pool.cc:371] [C263741] connecting
[2019-11-08 17:31:16.264][000013][debug][connection] [source/common/network/connection_impl.cc:634] [C263741] connecting to 127.0.0.1:9093
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/network/connection_impl.cc:643] [C263741] connection in progress
[2019-11-08 17:31:16.265][000013][debug][pool] [source/common/tcp/conn_pool.cc:106] queueing request due to no available connections
[2019-11-08 17:31:16.265][000013][debug][main] [source/server/connection_handler_impl.cc:257] [C263740] new connection
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263740] handshake error: 2
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/network/connection_impl.cc:516] [C263741] connected
[2019-11-08 17:31:16.265][000013][debug][pool] [source/common/tcp/conn_pool.cc:292] [C263741] assigning connection
[2019-11-08 17:31:16.265][000013][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:540] TCP:onUpstreamEvent(), requestedServerName:
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263740] handshake error: 2
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263740] handshake error: 5
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/network/connection_impl.cc:183] [C263740] closing socket: 0
[2019-11-08 17:31:16.266][000013][debug][connection] [source/common/network/connection_impl.cc:101] [C263741] closing data_to_write=0 type=0
[2019-11-08 17:31:16.266][000013][debug][connection] [source/common/network/connection_impl.cc:183] [C263741] closing socket: 1
[2019-11-08 17:31:16.266][000013][debug][pool] [source/common/tcp/conn_pool.cc:121] [C263741] client disconnected
[2019-11-08 17:31:16.266][000013][debug][main] [source/server/connection_handler_impl.cc:68] [C263740] adding to cleanup list
[2019-11-08 17:31:16.266][000013][debug][pool] [source/common/tcp/conn_pool.cc:245] [C263741] connection destroyed
[2019-11-08 17:31:21.167][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:26.168][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:26.267][000013][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C263742] new tcp proxy session
[2019-11-08 17:31:26.267][000013][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C263742] Creating connection to cluster local_app
[2019-11-08 17:31:26.267][000013][debug][pool] [source/common/tcp/conn_pool.cc:80] creating a new connection
[2019-11-08 17:31:26.267][000013][debug][pool] [source/common/tcp/conn_pool.cc:371] [C263743] connecting
[2019-11-08 17:31:26.267][000013][debug][connection] [source/common/network/connection_impl.cc:634] [C263743] connecting to 127.0.0.1:9093
[2019-11-08 17:31:26.268][000013][debug][connection] [source/common/network/connection_impl.cc:643] [C263743] connection in progress
[2019-11-08 17:31:26.268][000013][debug][pool] [source/common/tcp/conn_pool.cc:106] queueing request due to no available connections
[2019-11-08 17:31:26.268][000013][debug][main] [source/server/connection_handler_impl.cc:257] [C263742] new connection
[2019-11-08 17:31:26.268][000013][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263742] handshake error: 5
[2019-11-08 17:31:26.268][000013][debug][connection] [source/common/network/connection_impl.cc:183] [C263742] closing socket: 0
[2019-11-08 17:31:26.269][000013][debug][pool] [source/common/tcp/conn_pool.cc:213] canceling pending request
[2019-11-08 17:31:26.269][000013][debug][pool] [source/common/tcp/conn_pool.cc:221] canceling pending connection
[2019-11-08 17:31:26.269][000013][debug][connection] [source/common/network/connection_impl.cc:101] [C263743] closing data_to_write=0 type=1
[2019-11-08 17:31:26.269][000013][debug][connection] [source/common/network/connection_impl.cc:183] [C263743] closing socket: 1
[2019-11-08 17:31:26.269][000013][debug][pool] [source/common/tcp/conn_pool.cc:121] [C263743] client disconnected
[2019-11-08 17:31:26.269][000013][debug][main] [source/server/connection_handler_impl.cc:68] [C263742] adding to cleanup list
[2019-11-08 17:31:26.269][000013][debug][pool] [source/common/tcp/conn_pool.cc:245] [C263743] connection destroyed
[2019-11-08 17:31:31.169][000001][debug][main] [source/server/server.cc:143] flushing stats

There are no consul intentions blocking this.

What are we missing?

It looks like it’s getting a handshake error connecting to alertmanager running on 127.0.0.1:9093 which is weird because at this point the request should be over HTTP, not SSL.

If you kubectl exec into the alertmanager pod and run curl 127.0.0.1:9093/yourpath does it succeed?

Not behind a computer right now but yes that works.

A wget from the consul-connect-envoy-sidecar container in the alertmanager pod also works:

/ $ wget -qO- 127.0.0.1:9093/api/v1/alerts
{“status”:“success”,“data”:[{“labels”:{“alertname”:“CPUThrottlingHigh”,“container_name”:“prometheus-to-sd”,“namespace”:“kube-system”,“pod_name”:“prometheus-to-sd-6jbcd”,“prometheus”:“monitoring/rootlease-prometheus-opera-prometheus”,“severity”:“warning”},“annotations”:{“message”:“75% throttling of CPU in namespace kube-system for container prometheus-to-sd in pod prometheus-to-sd-6jbcd.”,“runbook_url”:“https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh”}

It looks like the handshake error is not directly (perhaps indirectly) related since they continue to happen after stopping the prometheus pod:

[2019-11-13 13:29:00.932][000014][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C72] new tcp proxy session
[2019-11-13 13:29:00.932][000014][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C72] Creating connection to cluster local_app
[2019-11-13 13:29:00.932][000014][debug][pool] [source/common/tcp/conn_pool.cc:80] creating a new connection
[2019-11-13 13:29:00.932][000014][debug][pool] [source/common/tcp/conn_pool.cc:371] [C73] connecting
[2019-11-13 13:29:00.932][000014][debug][connection] [source/common/network/connection_impl.cc:634] [C73] connecting to 127.0.0.1:9093
[2019-11-13 13:29:00.933][000014][debug][connection] [source/common/network/connection_impl.cc:643] [C73] connection in progress
[2019-11-13 13:29:00.933][000014][debug][pool] [source/common/tcp/conn_pool.cc:106] queueing request due to no available connections
[2019-11-13 13:29:00.933][000014][debug][main] [source/server/connection_handler_impl.cc:257] [C72] new connection
[2019-11-13 13:29:00.933][000014][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C72] handshake error: 2
[2019-11-13 13:29:00.933][000014][debug][connection] [source/common/network/connection_impl.cc:516] [C73] connected
[2019-11-13 13:29:00.933][000014][debug][pool] [source/common/tcp/conn_pool.cc:292] [C73] assigning connection
[2019-11-13 13:29:00.933][000014][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:540] TCP:onUpstreamEvent(), requestedServerName:
[2019-11-13 13:29:00.933][000014][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C72] handshake error: 2
[2019-11-13 13:29:00.935][000014][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C72] handshake error: 5
[2019-11-13 13:29:00.935][000014][debug][connection] [source/common/network/connection_impl.cc:183] [C72] closing socket: 0
[2019-11-13 13:29:00.935][000014][debug][connection] [source/common/network/connection_impl.cc:101] [C73] closing data_to_write=0 type=0
[2019-11-13 13:29:00.935][000014][debug][connection] [source/common/network/connection_impl.cc:183] [C73] closing socket: 1
[2019-11-13 13:29:00.935][000014][debug][pool] [source/common/tcp/conn_pool.cc:121] [C73] client disconnected
[2019-11-13 13:29:00.935][000014][debug][main] [source/server/connection_handler_impl.cc:68] [C72] adding to cleanup list
[2019-11-13 13:29:00.935][000014][debug][pool] [source/common/tcp/conn_pool.cc:245] [C73] connection destroyed

It looks like we have those handshake for every pod were we enable connect. Any idea where this is coming from?

Could it be the result of some healthcheck?
In any case all checks for alertmanager are green in consul-ui. (See screenshot)

So, based on the log output, the last thing I know for sure is that our request reaches the meshgateway on the alertmanager side, I have no idea what happens after that nor whether a envoy proxy session is actually started, the logs don’t show anything about it:

[2019-11-13 13:55:03.703][37][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:72] tls inspector: new connection accepted
[2019-11-13 13:55:03.703][37][trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:141] tls inspector: recv: 240
[2019-11-13 13:55:03.703][37][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:118] tls:onServerName(), requestedServerName: alertmanager.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul
[2019-11-13 13:55:03.703][37][trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:162] tls inspector: done: true
[2019-11-13 13:55:03.703][37][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C13862] new tcp proxy session
[2019-11-13 13:55:03.703][37][trace][connection] [source/common/network/connection_impl.cc:282] [C13862] readDisable: enabled=true disable=true
[2019-11-13 13:55:03.703][37][trace][filter] [source/extensions/filters/network/sni_cluster/sni_cluster.cc:16] [C13862] sni_cluster: new connection with server name alertmanager.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul
[2019-11-13 13:55:03.703][37][debug][connection] [source/common/network/connection_impl.cc:101] [C13862] closing data_to_write=0 type=1
[2019-11-13 13:55:03.703][37][debug][connection] [source/common/network/connection_impl.cc:183] [C13862] closing socket: 1
[2019-11-13 13:55:03.703][37][debug][main] [source/server/connection_handler_impl.cc:257] [C13862] new connection
[2019-11-13 13:55:03.703][37][trace][main] [source/common/event/dispatcher_impl.cc:133] item added to deferred deletion list (size=1)

What would be the next step in the flow?

I think the handshake errors are generated by proxy health checks not using valid client certificates as stated here:


As such they are not related to our issue.

To rule out anything prometheus/alertmanager specific I’ve setup the static-server/static-client hello world example described here:

The only adjustment I made is appending the datacenter name to the upstream as described in the documentation:

consul.hashicorp.com/connect-service-upstreams: static-server:1234:rootlease-gcp-europe-west6

Unfortunately this shows similar behaviour: static-client also receives a connection reset by peer. Request reaches the meshgateway in the other datacenter but then gets closed somehow. The connect-envoy-sidecar logs from the static-server pod show nothing related.

static-client connect-envoy-sidecar logs

[2019-11-14 13:34:41.764][000022][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C932] Creating connection to cluster static-server.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul [2019-11-14 13:34:41.764][000022][debug][connection] [source/common/network/connection_impl.cc:101] [C932] closing data_to_write=0 type=1 [2019-11-14 13:34:41.764][000022][debug][connection] [source/common/network/connection_impl.cc:183] [C932] closing socket: 1 [2019-11-14 13:34:41.764][000022][debug][main] [source/server/connection_handler_impl.cc:68] [C932] adding to cleanup list [2019-11-14 13:34:41.764][000022][debug][pool] [source/common/tcp/conn_pool.cc:245] [C933] connection destroyed

static-server meshgateway logs

[2019-11-14 13:04:47.042][31][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:72] tls inspector: new connection accepted [2019-11-14 13:04:47.046][31][trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:141] tls inspector: recv: 241 [2019-11-14 13:04:47.046][31][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:118] tls:onServerName(), requestedServerName: static-server.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul [2019-11-14 13:04:47.046][31][trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:162] tls inspector: done: true [2019-11-14 13:04:47.046][31][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C17199] new tcp proxy session [2019-11-14 13:04:47.046][31][trace][connection] [source/common/network/connection_impl.cc:282] [C17199] readDisable: enabled=true disable=true [2019-11-14 13:04:47.046][31][trace][filter] [source/extensions/filters/network/sni_cluster/sni_cluster.cc:16] [C17199] sni_cluster: new connection with server name static-server.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul [2019-11-14 13:04:47.046][31][debug][connection] [source/common/network/connection_impl.cc:101] [C17199] closing data_to_write=0 type=1 [2019-11-14 13:04:47.046][31][debug][connection] [source/common/network/connection_impl.cc:183] [C17199] closing socket: 1 [2019-11-14 13:04:47.046][31][debug][main] [source/server/connection_handler_impl.cc:257] [C17199] new connection [2019-11-14 13:04:47.046][31][trace][main] [source/common/event/dispatcher_impl.cc:133] item added to deferred deletion list (size=1) [2019-11-14 13:04:47.046][31][trace][main] [source/common/event/dispatcher_impl.cc:53] clearing deferred deletion list (size=1)

Thanks for finding those issues about the SSL handshake. So it sounds like we’re not making it from the local mesh gateway to the static-server Pod.

  • Can you port-forward to 19000 on the mesh gateway to access the envoy admin UI and go to http://localhost:19000/clusters. From there, look for the lines with static-server.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul.
  • Each line should have an IP:port, ex. static-server.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul::10.244.6.10:20000::cx_active::0 => 10.244.6.10:20000.
  • Can you confirm that that is the IP of the static-server pod and that you can connect to that port from the mesh gateway pod.
  • Each line has a different stat, e.g. cx_connect_fail. Do you see any that have a count > 0?

More generally can you confirm that:

  1. You’re running Consul >= 1.6.0
  2. connect { enabled = true } is set in your config
  3. primary_datacenter is set to the same datacenter in all your datacenters
  4. ports { grpc = 8502 } is set
  5. enable_central_service_config = true is set
  6. You’ve set the config
    Kind = "proxy-defaults"
    Name = "global"
    MeshGateway {
       Mode = "local"
    }
    

Envoy clusters output

static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::outlier::success_rate_average::-1 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::outlier::success_rate_ejection_threshold::-1 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::default_priority::max_connections::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::default_priority::max_pending_requests::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::default_priority::max_requests::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::default_priority::max_retries::3 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::high_priority::max_connections::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::high_priority::max_pending_requests::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::high_priority::max_requests::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::high_priority::max_retries::3 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::added_via_api::true static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::cx_active::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::cx_connect_fail::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::cx_total::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::rq_active::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::rq_error::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::rq_success::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::rq_timeout::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::rq_total::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::health_flags::healthy static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::weight::1 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::region:: static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::zone:: static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::sub_zone:: static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::canary::false static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::success_rate::-1

Ip static-server pod confirmed

static-server 2/2 Running 0 6h44m 10.8.1.14

connection ok

kubectl -n consul exec rootlease-consul-consul-mesh-gateway-6c5c5dfbcb-8mmxf -- telnet 10.8.1.14 20000 Trying 10.8.1.14... Connected to 10.8.1.14. Escape character is '^]'.

Version

Running 1.6.0

Connect

Connect is enabled

cat consul/config/proxy-defaults-config.json

{
“config_entries”: {
“bootstrap”: [
{
“kind”: “proxy-defaults”,
“name”: “global”,
“mesh_gateway”: {
“mode”: “local”
}
}
]
}

So yes all the above confirmed.

For others, this ended up being due to the secondary DC needing to have a rolling restart of its servers. The primary_datacenter config had been added after the secondary DC had been brought up and requires a full restart of the servers. More details here: https://github.com/hashicorp/consul-helm/issues/295