Consul service mesh connectivity issues

brandomeesdom · November 8, 2019, 5:33pm

Hi All,

We experience issues trying to have 2 services (prometheus -> alertmanager) communicate over consul service mesh.

Our setup:

Using the official consul helm chart I have consul wan federation setup over 2 consul clusters running on 2 separate kubernetes clusters, each in a different datacenter. Wan federation works. I also enabled consul connect / service mesh. This however does not work. I have a prometheus pod (with a consul-connect-envoy-sidecar) in the one datacenter and an alertmanager pod in the other datacenter. I used annotations to enable and configure connect for those pods. In the consul ui I can see the prometheus pod has the backend pod as an upstream, in consul ui all healthchecks are green, i.e. for the client & backend services, for the sidecar proxies of those services and for the meshgateways.

In the logs I can see that the connection request from prometheus towards the alertmanager reaches the meshgateway in the datacenter on the alertmanager side and from there on reaches the consul-connect-envoy-sidecar inside the alertmanager pod.

meshgateway

[2019-11-08 17:19:48.962][29][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:72] tls inspector: new connection accepted
[2019-11-08 17:19:48.962][29][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:118] tls:onServerName(), requestedServerName: alertmanager.default.datacenter-B-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul
[2019-11-08 17:19:48.962][29][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C70322] new tcp proxy session
[2019-11-08 17:19:48.962][29][debug][connection] [source/common/network/connection_impl.cc:101] [C70322] closing data_to_write=0 type=1
[2019-11-08 17:19:48.962][29][debug][connection] [source/common/network/connection_impl.cc:183] [C70322] closing socket: 1

consul-connect-envoy-sidecar

[2019-11-08 17:30:56.163][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:30:56.259][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C263736] new tcp proxy session
[2019-11-08 17:30:56.259][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C263736] Creating connection to cluster local_app
[2019-11-08 17:30:56.259][000012][debug][pool] [source/common/tcp/conn_pool.cc:80] creating a new connection
[2019-11-08 17:30:56.259][000012][debug][pool] [source/common/tcp/conn_pool.cc:371] [C263737] connecting
[2019-11-08 17:30:56.259][000012][debug][connection] [source/common/network/connection_impl.cc:634] [C263737] connecting to 127.0.0.1:9093
[2019-11-08 17:30:56.260][000012][debug][connection] [source/common/network/connection_impl.cc:643] [C263737] connection in progress
[2019-11-08 17:30:56.260][000012][debug][pool] [source/common/tcp/conn_pool.cc:106] queueing request due to no available connections
[2019-11-08 17:30:56.260][000012][debug][main] [source/server/connection_handler_impl.cc:257] [C263736] new connection
[2019-11-08 17:30:56.260][000012][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263736] handshake error: 2
[2019-11-08 17:30:56.261][000012][debug][connection] [source/common/network/connection_impl.cc:516] [C263737] connected
[2019-11-08 17:30:56.261][000012][debug][pool] [source/common/tcp/conn_pool.cc:292] [C263737] assigning connection
[2019-11-08 17:30:56.261][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:540] TCP:onUpstreamEvent(), requestedServerName:
[2019-11-08 17:30:56.261][000012][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263736] handshake error: 5
[2019-11-08 17:30:56.261][000012][debug][connection] [source/common/network/connection_impl.cc:183] [C263736] closing socket: 0
[2019-11-08 17:30:56.261][000012][debug][connection] [source/common/network/connection_impl.cc:101] [C263737] closing data_to_write=0 type=0
[2019-11-08 17:30:56.261][000012][debug][connection] [source/common/network/connection_impl.cc:183] [C263737] closing socket: 1
[2019-11-08 17:30:56.261][000012][debug][pool] [source/common/tcp/conn_pool.cc:121] [C263737] client disconnected
[2019-11-08 17:30:56.261][000012][debug][main] [source/server/connection_handler_impl.cc:68] [C263736] adding to cleanup list
[2019-11-08 17:30:56.262][000012][debug][pool] [source/common/tcp/conn_pool.cc:245] [C263737] connection destroyed
[2019-11-08 17:31:01.163][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:06.164][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:06.262][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C263738] new tcp proxy session
[2019-11-08 17:31:06.262][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C263738] Creating connection to cluster local_app
[2019-11-08 17:31:06.262][000012][debug][pool] [source/common/tcp/conn_pool.cc:80] creating a new connection
[2019-11-08 17:31:06.262][000012][debug][pool] [source/common/tcp/conn_pool.cc:371] [C263739] connecting
[2019-11-08 17:31:06.262][000012][debug][connection] [source/common/network/connection_impl.cc:634] [C263739] connecting to 127.0.0.1:9093
[2019-11-08 17:31:06.262][000012][debug][connection] [source/common/network/connection_impl.cc:643] [C263739] connection in progress
[2019-11-08 17:31:06.262][000012][debug][pool] [source/common/tcp/conn_pool.cc:106] queueing request due to no available connections
[2019-11-08 17:31:06.262][000012][debug][main] [source/server/connection_handler_impl.cc:257] [C263738] new connection
[2019-11-08 17:31:06.262][000012][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263738] handshake error: 2
[2019-11-08 17:31:06.262][000012][debug][connection] [source/common/network/connection_impl.cc:516] [C263739] connected
[2019-11-08 17:31:06.262][000012][debug][pool] [source/common/tcp/conn_pool.cc:292] [C263739] assigning connection
[2019-11-08 17:31:06.262][000012][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:540] TCP:onUpstreamEvent(), requestedServerName:
[2019-11-08 17:31:06.262][000012][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263738] handshake error: 2
[2019-11-08 17:31:06.263][000012][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263738] handshake error: 5
[2019-11-08 17:31:06.263][000012][debug][connection] [source/common/network/connection_impl.cc:183] [C263738] closing socket: 0
[2019-11-08 17:31:06.263][000012][debug][connection] [source/common/network/connection_impl.cc:101] [C263739] closing data_to_write=0 type=0
[2019-11-08 17:31:06.263][000012][debug][connection] [source/common/network/connection_impl.cc:183] [C263739] closing socket: 1
[2019-11-08 17:31:06.263][000012][debug][pool] [source/common/tcp/conn_pool.cc:121] [C263739] client disconnected
[2019-11-08 17:31:06.263][000012][debug][main] [source/server/connection_handler_impl.cc:68] [C263738] adding to cleanup list
[2019-11-08 17:31:06.263][000012][debug][pool] [source/common/tcp/conn_pool.cc:245] [C263739] connection destroyed
[2019-11-08 17:31:11.165][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:16.165][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:16.264][000013][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C263740] new tcp proxy session
[2019-11-08 17:31:16.264][000013][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C263740] Creating connection to cluster local_app
[2019-11-08 17:31:16.264][000013][debug][pool] [source/common/tcp/conn_pool.cc:80] creating a new connection
[2019-11-08 17:31:16.264][000013][debug][pool] [source/common/tcp/conn_pool.cc:371] [C263741] connecting
[2019-11-08 17:31:16.264][000013][debug][connection] [source/common/network/connection_impl.cc:634] [C263741] connecting to 127.0.0.1:9093
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/network/connection_impl.cc:643] [C263741] connection in progress
[2019-11-08 17:31:16.265][000013][debug][pool] [source/common/tcp/conn_pool.cc:106] queueing request due to no available connections
[2019-11-08 17:31:16.265][000013][debug][main] [source/server/connection_handler_impl.cc:257] [C263740] new connection
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263740] handshake error: 2
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/network/connection_impl.cc:516] [C263741] connected
[2019-11-08 17:31:16.265][000013][debug][pool] [source/common/tcp/conn_pool.cc:292] [C263741] assigning connection
[2019-11-08 17:31:16.265][000013][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:540] TCP:onUpstreamEvent(), requestedServerName:
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263740] handshake error: 2
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263740] handshake error: 5
[2019-11-08 17:31:16.265][000013][debug][connection] [source/common/network/connection_impl.cc:183] [C263740] closing socket: 0
[2019-11-08 17:31:16.266][000013][debug][connection] [source/common/network/connection_impl.cc:101] [C263741] closing data_to_write=0 type=0
[2019-11-08 17:31:16.266][000013][debug][connection] [source/common/network/connection_impl.cc:183] [C263741] closing socket: 1
[2019-11-08 17:31:16.266][000013][debug][pool] [source/common/tcp/conn_pool.cc:121] [C263741] client disconnected
[2019-11-08 17:31:16.266][000013][debug][main] [source/server/connection_handler_impl.cc:68] [C263740] adding to cleanup list
[2019-11-08 17:31:16.266][000013][debug][pool] [source/common/tcp/conn_pool.cc:245] [C263741] connection destroyed
[2019-11-08 17:31:21.167][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:26.168][000001][debug][main] [source/server/server.cc:143] flushing stats
[2019-11-08 17:31:26.267][000013][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C263742] new tcp proxy session
[2019-11-08 17:31:26.267][000013][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C263742] Creating connection to cluster local_app
[2019-11-08 17:31:26.267][000013][debug][pool] [source/common/tcp/conn_pool.cc:80] creating a new connection
[2019-11-08 17:31:26.267][000013][debug][pool] [source/common/tcp/conn_pool.cc:371] [C263743] connecting
[2019-11-08 17:31:26.267][000013][debug][connection] [source/common/network/connection_impl.cc:634] [C263743] connecting to 127.0.0.1:9093
[2019-11-08 17:31:26.268][000013][debug][connection] [source/common/network/connection_impl.cc:643] [C263743] connection in progress
[2019-11-08 17:31:26.268][000013][debug][pool] [source/common/tcp/conn_pool.cc:106] queueing request due to no available connections
[2019-11-08 17:31:26.268][000013][debug][main] [source/server/connection_handler_impl.cc:257] [C263742] new connection
[2019-11-08 17:31:26.268][000013][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C263742] handshake error: 5
[2019-11-08 17:31:26.268][000013][debug][connection] [source/common/network/connection_impl.cc:183] [C263742] closing socket: 0
[2019-11-08 17:31:26.269][000013][debug][pool] [source/common/tcp/conn_pool.cc:213] canceling pending request
[2019-11-08 17:31:26.269][000013][debug][pool] [source/common/tcp/conn_pool.cc:221] canceling pending connection
[2019-11-08 17:31:26.269][000013][debug][connection] [source/common/network/connection_impl.cc:101] [C263743] closing data_to_write=0 type=1
[2019-11-08 17:31:26.269][000013][debug][connection] [source/common/network/connection_impl.cc:183] [C263743] closing socket: 1
[2019-11-08 17:31:26.269][000013][debug][pool] [source/common/tcp/conn_pool.cc:121] [C263743] client disconnected
[2019-11-08 17:31:26.269][000013][debug][main] [source/server/connection_handler_impl.cc:68] [C263742] adding to cleanup list
[2019-11-08 17:31:26.269][000013][debug][pool] [source/common/tcp/conn_pool.cc:245] [C263743] connection destroyed
[2019-11-08 17:31:31.169][000001][debug][main] [source/server/server.cc:143] flushing stats

There are no consul intentions blocking this.

What are we missing?

lkysow · November 8, 2019, 5:56pm

It looks like it’s getting a handshake error connecting to alertmanager running on 127.0.0.1:9093 which is weird because at this point the request should be over HTTP, not SSL.

If you kubectl exec into the alertmanager pod and run curl 127.0.0.1:9093/yourpath does it succeed?

brandomeesdom · November 8, 2019, 6:34pm

Not behind a computer right now but yes that works.

brandomeesdom · November 13, 2019, 9:07am

A wget from the consul-connect-envoy-sidecar container in the alertmanager pod also works:

/ $ wget -qO- 127.0.0.1:9093/api/v1/alerts
{“status”:“success”,“data”:[{“labels”:{“alertname”:“CPUThrottlingHigh”,“container_name”:“prometheus-to-sd”,“namespace”:“kube-system”,“pod_name”:“prometheus-to-sd-6jbcd”,“prometheus”:“monitoring/rootlease-prometheus-opera-prometheus”,“severity”:“warning”},“annotations”:{“message”:“75% throttling of CPU in namespace kube-system for container prometheus-to-sd in pod prometheus-to-sd-6jbcd.”,“runbook_url”:“https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh”}

brandomeesdom · November 13, 2019, 2:11pm

It looks like the handshake error is not directly (perhaps indirectly) related since they continue to happen after stopping the prometheus pod:

[2019-11-13 13:29:00.932][000014][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C72] new tcp proxy session
[2019-11-13 13:29:00.932][000014][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C72] Creating connection to cluster local_app
[2019-11-13 13:29:00.932][000014][debug][pool] [source/common/tcp/conn_pool.cc:80] creating a new connection
[2019-11-13 13:29:00.932][000014][debug][pool] [source/common/tcp/conn_pool.cc:371] [C73] connecting
[2019-11-13 13:29:00.932][000014][debug][connection] [source/common/network/connection_impl.cc:634] [C73] connecting to 127.0.0.1:9093
[2019-11-13 13:29:00.933][000014][debug][connection] [source/common/network/connection_impl.cc:643] [C73] connection in progress
[2019-11-13 13:29:00.933][000014][debug][pool] [source/common/tcp/conn_pool.cc:106] queueing request due to no available connections
[2019-11-13 13:29:00.933][000014][debug][main] [source/server/connection_handler_impl.cc:257] [C72] new connection
[2019-11-13 13:29:00.933][000014][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C72] handshake error: 2
[2019-11-13 13:29:00.933][000014][debug][connection] [source/common/network/connection_impl.cc:516] [C73] connected
[2019-11-13 13:29:00.933][000014][debug][pool] [source/common/tcp/conn_pool.cc:292] [C73] assigning connection
[2019-11-13 13:29:00.933][000014][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:540] TCP:onUpstreamEvent(), requestedServerName:
[2019-11-13 13:29:00.933][000014][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C72] handshake error: 2
[2019-11-13 13:29:00.935][000014][debug][connection] [source/common/ssl/ssl_socket.cc:135] [C72] handshake error: 5
[2019-11-13 13:29:00.935][000014][debug][connection] [source/common/network/connection_impl.cc:183] [C72] closing socket: 0
[2019-11-13 13:29:00.935][000014][debug][connection] [source/common/network/connection_impl.cc:101] [C73] closing data_to_write=0 type=0
[2019-11-13 13:29:00.935][000014][debug][connection] [source/common/network/connection_impl.cc:183] [C73] closing socket: 1
[2019-11-13 13:29:00.935][000014][debug][pool] [source/common/tcp/conn_pool.cc:121] [C73] client disconnected
[2019-11-13 13:29:00.935][000014][debug][main] [source/server/connection_handler_impl.cc:68] [C72] adding to cleanup list
[2019-11-13 13:29:00.935][000014][debug][pool] [source/common/tcp/conn_pool.cc:245] [C73] connection destroyed

It looks like we have those handshake for every pod were we enable connect. Any idea where this is coming from?

Could it be the result of some healthcheck?
In any case all checks for alertmanager are green in consul-ui. (See screenshot)

So, based on the log output, the last thing I know for sure is that our request reaches the meshgateway on the alertmanager side, I have no idea what happens after that nor whether a envoy proxy session is actually started, the logs don’t show anything about it:

[2019-11-13 13:55:03.703][37][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:72] tls inspector: new connection accepted
[2019-11-13 13:55:03.703][37][trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:141] tls inspector: recv: 240
[2019-11-13 13:55:03.703][37][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:118] tls:onServerName(), requestedServerName: alertmanager.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul
[2019-11-13 13:55:03.703][37][trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:162] tls inspector: done: true
[2019-11-13 13:55:03.703][37][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C13862] new tcp proxy session
[2019-11-13 13:55:03.703][37][trace][connection] [source/common/network/connection_impl.cc:282] [C13862] readDisable: enabled=true disable=true
[2019-11-13 13:55:03.703][37][trace][filter] [source/extensions/filters/network/sni_cluster/sni_cluster.cc:16] [C13862] sni_cluster: new connection with server name alertmanager.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul
[2019-11-13 13:55:03.703][37][debug][connection] [source/common/network/connection_impl.cc:101] [C13862] closing data_to_write=0 type=1
[2019-11-13 13:55:03.703][37][debug][connection] [source/common/network/connection_impl.cc:183] [C13862] closing socket: 1
[2019-11-13 13:55:03.703][37][debug][main] [source/server/connection_handler_impl.cc:257] [C13862] new connection
[2019-11-13 13:55:03.703][37][trace][main] [source/common/event/dispatcher_impl.cc:133] item added to deferred deletion list (size=1)

What would be the next step in the flow?

brandomeesdom · November 14, 2019, 1:46pm

I think the handshake errors are generated by proxy health checks not using valid client certificates as stated here:

As such they are not related to our issue.

To rule out anything prometheus/alertmanager specific I’ve setup the static-server/static-client hello world example described here:

The only adjustment I made is appending the datacenter name to the upstream as described in the documentation:

consul.hashicorp.com/connect-service-upstreams: static-server:1234:rootlease-gcp-europe-west6

Unfortunately this shows similar behaviour: static-client also receives a connection reset by peer. Request reaches the meshgateway in the other datacenter but then gets closed somehow. The connect-envoy-sidecar logs from the static-server pod show nothing related.

static-client connect-envoy-sidecar logs

[2019-11-14 13:34:41.764][000022][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:342] [C932] Creating connection to cluster static-server.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul [2019-11-14 13:34:41.764][000022][debug][connection] [source/common/network/connection_impl.cc:101] [C932] closing data_to_write=0 type=1 [2019-11-14 13:34:41.764][000022][debug][connection] [source/common/network/connection_impl.cc:183] [C932] closing socket: 1 [2019-11-14 13:34:41.764][000022][debug][main] [source/server/connection_handler_impl.cc:68] [C932] adding to cleanup list [2019-11-14 13:34:41.764][000022][debug][pool] [source/common/tcp/conn_pool.cc:245] [C933] connection destroyed

static-server meshgateway logs

[2019-11-14 13:04:47.042][31][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:72] tls inspector: new connection accepted [2019-11-14 13:04:47.046][31][trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:141] tls inspector: recv: 241 [2019-11-14 13:04:47.046][31][debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:118] tls:onServerName(), requestedServerName: static-server.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul [2019-11-14 13:04:47.046][31][trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:162] tls inspector: done: true [2019-11-14 13:04:47.046][31][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:200] [C17199] new tcp proxy session [2019-11-14 13:04:47.046][31][trace][connection] [source/common/network/connection_impl.cc:282] [C17199] readDisable: enabled=true disable=true [2019-11-14 13:04:47.046][31][trace][filter] [source/extensions/filters/network/sni_cluster/sni_cluster.cc:16] [C17199] sni_cluster: new connection with server name static-server.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul [2019-11-14 13:04:47.046][31][debug][connection] [source/common/network/connection_impl.cc:101] [C17199] closing data_to_write=0 type=1 [2019-11-14 13:04:47.046][31][debug][connection] [source/common/network/connection_impl.cc:183] [C17199] closing socket: 1 [2019-11-14 13:04:47.046][31][debug][main] [source/server/connection_handler_impl.cc:257] [C17199] new connection [2019-11-14 13:04:47.046][31][trace][main] [source/common/event/dispatcher_impl.cc:133] item added to deferred deletion list (size=1) [2019-11-14 13:04:47.046][31][trace][main] [source/common/event/dispatcher_impl.cc:53] clearing deferred deletion list (size=1)

lkysow · November 14, 2019, 5:53pm

Thanks for finding those issues about the SSL handshake. So it sounds like we’re not making it from the local mesh gateway to the static-server Pod.

Can you port-forward to 19000 on the mesh gateway to access the envoy admin UI and go to http://localhost:19000/clusters. From there, look for the lines with static-server.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul.
Each line should have an IP:port, ex. static-server.default.rootlease-gcp-europe-west6.internal.72300fbd-d88c-f97d-cd40-16502cb362ea.consul::10.244.6.10:20000::cx_active::0 => 10.244.6.10:20000.
Can you confirm that that is the IP of the static-server pod and that you can connect to that port from the mesh gateway pod.
Each line has a different stat, e.g. cx_connect_fail. Do you see any that have a count > 0?

More generally can you confirm that:

You’re running Consul >= 1.6.0
connect { enabled = true } is set in your config
primary_datacenter is set to the same datacenter in all your datacenters
ports { grpc = 8502 } is set
enable_central_service_config = true is set

You’ve set the config

Kind = "proxy-defaults"
Name = "global"
MeshGateway {
   Mode = "local"
}

brandomeesdom · November 27, 2019, 2:18pm

Envoy clusters output

static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::outlier::success_rate_average::-1 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::outlier::success_rate_ejection_threshold::-1 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::default_priority::max_connections::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::default_priority::max_pending_requests::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::default_priority::max_requests::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::default_priority::max_retries::3 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::high_priority::max_connections::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::high_priority::max_pending_requests::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::high_priority::max_requests::1024 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::high_priority::max_retries::3 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::added_via_api::true static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::cx_active::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::cx_connect_fail::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::cx_total::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::rq_active::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::rq_error::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::rq_success::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::rq_timeout::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::rq_total::0 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::health_flags::healthy static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::weight::1 static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::region:: static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::zone:: static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::sub_zone:: static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::canary::false static-server.default.rootlease-gcp-europe-west6.internal.570052d0-3fb3-f0d8-b227-bcd51678d07a.consul::10.8.1.14:20000::success_rate::-1

Ip static-server pod confirmed

static-server 2/2 Running 0 6h44m 10.8.1.14

connection ok

kubectl -n consul exec rootlease-consul-consul-mesh-gateway-6c5c5dfbcb-8mmxf -- telnet 10.8.1.14 20000 Trying 10.8.1.14... Connected to 10.8.1.14. Escape character is '^]'.

Version

Running 1.6.0

Connect

Connect is enabled

cat consul/config/proxy-defaults-config.json

{
“config_entries”: {
“bootstrap”: [
{
“kind”: “proxy-defaults”,
“name”: “global”,
“mesh_gateway”: {
“mode”: “local”
}
}
]
}

So yes all the above confirmed.

lkysow · November 28, 2019, 9:36pm

For others, this ended up being due to the secondary DC needing to have a rolling restart of its servers. The primary_datacenter config had been added after the secondary DC had been brought up and requires a full restart of the servers. More details here: https://github.com/hashicorp/consul-helm/issues/295

Topic		Replies	Views
Help with communication across multiple clusters Consul	1	352	July 13, 2022
Issue on mesh gateway (Vm) on wan federated consul cluster (k8s<->vm) Consul k8s , helm , consul	7	813	November 16, 2022
Transparent proxy for service mesh randomly loses connection to all upstream services Consul	1	47	July 29, 2024
Unable to connect services between datacenters despite working mesh gateways Consul	2	381	August 30, 2021
Consul Service Mesh Topology Visualization with Nomad and Prometheus Consul	0	289	February 1, 2021