Error Wan federation between VMs and Kubernetes

Hello!

I was following this guide
https://www.consul.io/docs/k8s/installation/multi-cluster/vms-and-kubernetes

But the federation failed.
Log from the consul server in Kubernetes:

2020-07-23T13:39:00.306Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: yao-dc1-server-0.dc1 10.1.168.149
7/23/2020 9:39:00 AM 2020-07-23T13:39:00.306Z [INFO] agent.server: Handled event for server in area: event=member-join server=yao-dc1-server-0.dc1 area=wan
7/23/2020 9:39:00 AM 2020-07-23T13:39:00.401Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.149:8302: EOF
7/23/2020 9:39:00 AM 2020-07-23T13:39:00.799Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:39:00 AM 2020-07-23T13:39:00.902Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.149:8302: EOF
7/23/2020 9:39:01 AM 2020-07-23T13:39:01.743Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:39:01 AM 2020-07-23T13:39:01.901Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.149:8302: EOF
7/23/2020 9:39:02 AM 2020-07-23T13:39:02.691Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:39:04 AM 2020-07-23T13:39:04.900Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ack: EOF from=10.42.1.112:55954
7/23/2020 9:39:05 AM 2020-07-23T13:39:05.133Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:39:09 AM 2020-07-23T13:39:09.369Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:39:11 AM 2020-07-23T13:39:11.669Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:39:12 AM 2020-07-23T13:39:12.998Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:39:13 AM 2020-07-23T13:39:13.399Z [INFO] agent.server.memberlist.wan: memberlist: Suspect yao-dc1-server-0.dc1 has failed, no acks received
7/23/2020 9:39:13 AM 2020-07-23T13:39:13.501Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: EOF
7/23/2020 9:39:44 AM 2020-07-23T13:39:44.703Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:39:44 AM 2020-07-23T13:39:44.808Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ack: EOF from=10.42.1.112:55954
7/23/2020 9:39:44 AM 2020-07-23T13:39:44.808Z [WARN] agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: yao-dc1-server-0.dc1)
7/23/2020 9:39:44 AM 2020-07-23T13:39:44.902Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.149:8302: EOF
7/23/2020 9:39:58 AM 2020-07-23T13:39:58.401Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: EOF
7/23/2020 9:39:59 AM 2020-07-23T13:39:59.013Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:40:03 AM 2020-07-23T13:40:03.401Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: EOF
7/23/2020 9:40:34 AM 2020-07-23T13:40:34.808Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ack: EOF from=10.42.1.112:55954
7/23/2020 9:40:35 AM 2020-07-23T13:40:35.638Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:40:36 AM 2020-07-23T13:40:35.999Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.149:8300 datacenter=dc1 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
7/23/2020 9:40:39 AM 2020-07-23T13:40:39.614Z [ERROR] agent.server.memberlist.wan: memberlist: Push/Pull with yao-dc1-server-0.dc1 failed: EOF
7/23/2020 9:40:40 AM 2020-07-23T13:40:40.630Z [WARN] agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=dc1
7/23/2020 9:40:43 AM 2020-07-23T13:40:43.401Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: EOF
7/23/2020 9:40:47 AM 2020-07-23T13:40:47.425Z [WARN] agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=dc1
7/23/2020 9:40:47 AM 2020-07-23T13:40:47.750Z [WARN] agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=dc1
7/23/2020 9:40:48 AM 2020-07-23T13:40:48.499Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: EOF
7/23/2020 9:40:49 AM 2020-07-23T13:40:49.726Z [WARN] agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=dc1
7/23/2020 9:40:50 AM 2020-07-23T13:40:50.297Z [WARN] agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=dc1
7/23/2020 9:40:51 AM 2020-07-23T13:40:51.602Z [WARN] agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=dc1

Netstat of vm consul server:

Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 127.0.0.1:8600 0.0.0.0:* LISTEN 999 37945 3462/consul
tcp 0 0 10.1.168.149:8443 0.0.0.0:* LISTEN 1000 37288 3497/envoy
tcp 0 0 127.0.0.1:8443 0.0.0.0:* LISTEN 1000 37281 3497/envoy
tcp 0 0 127.0.0.1:19005 0.0.0.0:* LISTEN 1000 37269 3497/envoy
tcp 0 0 127.0.0.1:8500 0.0.0.0:* LISTEN 999 37947 3462/consul
tcp 0 0 127.0.0.1:8501 0.0.0.0:* LISTEN 999 37948 3462/consul
tcp 0 0 127.0.0.1:8502 0.0.0.0:* LISTEN 999 37951 3462/consul
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 16412 1236/sshd
tcp6 0 0 :::8300 :::* LISTEN 999 37186 3462/consul
tcp6 0 0 :::8301 :::* LISTEN 999 37189 3462/consul
tcp6 0 0 :::8302 :::* LISTEN 999 37187 3462/consul
tcp6 0 0 :::22 :::* LISTEN 0 16421 1236/sshd
udp 0 0 0.0.0.0:68 0.0.0.0:* 0 12064 850/dhclient
udp 0 0 127.0.0.1:8600 0.0.0.0:* 999 37944 3462/consul
udp6 0 0 :::8301 :::* 999 37190 3462/consul
udp6 0 0 :::8302 :::* 999 37188 3462/consul

I can nc to port 8300 and port 8302 on the vm consul server from Kubernetes consul server.

Log from vm consul server:

2020-07-23T13:39:01.381Z [INFO] agent: Synced service: service=gateway-secondary
2020-07-23T13:39:01.394Z [INFO] agent: Synced service: service=gateway-secondary
2020-07-23T13:39:01.464Z [INFO] agent.server: federation state anti-entropy synced
2020-07-23T13:39:03.864Z [INFO] agent.server.memberlist.wan: memberlist: Suspect consul-server-0.dc2 has failed, no acks received
2020-07-23T13:39:04.161Z [INFO] agent.server.serf.wan: serf: attempting reconnect to consul-server-0.dc2 10.42.1.114:8302
2020-07-23T13:39:04.262Z [WARN] agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: yao-dc1-server-0.dc1)
2020-07-23T13:39:04.262Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.dc2 10.42.1.114
2020-07-23T13:39:04.262Z [INFO] agent.server: Handled event for server in area: event=member-join server=consul-server-0.dc2 area=wan
2020-07-23T13:39:06.008Z [INFO] agent: Synced check: check=service:gateway-secondary
2020-07-23T13:39:06.051Z [INFO] agent.server: federation state anti-entropy synced
2020-07-23T13:39:48.864Z [INFO] agent.server.memberlist.wan: memberlist: Suspect consul-server-0.dc2 has failed, no acks received
2020-07-23T13:40:18.865Z [INFO] agent.server.memberlist.wan: memberlist: Marking consul-server-0.dc2 as failed, suspect timeout reached (0 peer confirmations)
2020-07-23T13:40:18.865Z [INFO] agent.server.serf.wan: serf: EventMemberFailed: consul-server-0.dc2 10.42.1.114
2020-07-23T13:40:18.865Z [INFO] agent.server: Handled event for server in area: event=member-failed server=consul-server-0.dc2 area=wan
2020-07-23T13:40:28.865Z [INFO] agent.server.memberlist.wan: memberlist: Suspect consul-server-0.dc2 has failed, no acks received
2020-07-23T13:40:34.263Z [INFO] agent.server.serf.wan: serf: attempting reconnect to consul-server-0.dc2 10.42.1.114:8302
2020-07-23T13:40:34.359Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.dc2 10.42.1.114
2020-07-23T13:40:34.359Z [INFO] agent.server: Handled event for server in area: event=member-join server=consul-server-0.dc2 area=wan
2020-07-23T13:41:18.865Z [INFO] agent.server.memberlist.wan: memberlist: Suspect consul-server-0.dc2 has failed, no acks received
2020-07-23T13:41:48.865Z [INFO] agent.server.memberlist.wan: memberlist: Marking consul-server-0.dc2 as failed, suspect timeout reached (0 peer confirmations)
2020-07-23T13:41:48.865Z [INFO] agent.server.serf.wan: serf: EventMemberFailed: consul-server-0.dc2 10.42.1.114
2020-07-23T13:41:48.865Z [INFO] agent.server: Handled event for server in area: event=member-failed server=consul-server-0.dc2 area=wan
2020-07-23T13:42:03.864Z [INFO] agent.server.memberlist.wan: memberlist: Suspect consul-server-0.dc2 has failed, no acks received
2020-07-23T13:42:04.359Z [INFO] agent.server.serf.wan: serf: attempting reconnect to consul-server-0.dc2 10.42.1.114:8302
2020-07-23T13:42:04.459Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.dc2 10.42.1.114
2020-07-23T13:42:04.460Z [INFO] agent.server: Handled event for server in area: event=member-join server=consul-server-0.dc2 area=wan
2020-07-23T13:42:53.864Z [INFO] agent.server.memberlist.wan: memberlist: Suspect consul-server-0.dc2 has failed, no acks received
2020-07-23T13:43:23.864Z [INFO] agent.server.memberlist.wan: memberlist: Marking consul-server-0.dc2 as failed, suspect timeout reached (0 peer confirmations)

Log from vm envoy mesh gateway:

==> Registered service: gateway-secondary
[2020-07-23 13:39:01.423][3497][info][main] [external/envoy/source/server/server.cc:251] initializing epoch 0 (hot restart version=disabled)
[2020-07-23 13:39:01.423][3497][info][main] [external/envoy/source/server/server.cc:253] statically linked extensions:
[2020-07-23 13:39:01.423][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.thrift_proxy.transports: auto, framed, header, unframed
[2020-07-23 13:39:01.423][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.stats_sinks: envoy.dog_statsd, envoy.metrics_service, envoy.stat_sinks.hystrix, envoy.statsd
[2020-07-23 13:39:01.423][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.tracers: envoy.dynamic.ot, envoy.lightstep, envoy.tracers.datadog, envoy.tracers.opencensus, envoy.tracers.xray, envoy.zipkin
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.udp_listeners: raw_udp_listener
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.access_loggers: envoy.file_access_log, envoy.http_grpc_access_log, envoy.tcp_grpc_access_log
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.thrift_proxy.filters: envoy.filters.thrift.rate_limit, envoy.filters.thrift.router
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.retry_priorities: envoy.retry_priorities.previous_priorities
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.retry_host_predicates: envoy.retry_host_predicates.omit_canary_hosts, envoy.retry_host_predicates.previous_hosts
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.dubbo_proxy.serializers: dubbo.hessian2
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.resolvers: envoy.ip
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.filters.udp_listener: envoy.filters.udp_listener.udp_proxy
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.filters.network: envoy.client_ssl_auth, envoy.echo, envoy.ext_authz, envoy.filters.network.dubbo_proxy, envoy.filters.network.kafka_broker, envoy.filters.network.local_ratelimit, envoy.filters.network.mysql_proxy, envoy.filters.network.rbac, envoy.filters.network.sni_cluster, envoy.filters.network.thrift_proxy, envoy.filters.network.zookeeper_proxy, envoy.http_connection_manager, envoy.mongo_proxy, envoy.ratelimit, envoy.redis_proxy, envoy.tcp_proxy
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.resource_monitors: envoy.resource_monitors.fixed_heap, envoy.resource_monitors.injected_resource
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.filters.http: envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.dynamic_forward_proxy, envoy.filters.http.grpc_http1_reverse_bridge, envoy.filters.http.grpc_stats, envoy.filters.http.header_to_metadata, envoy.filters.http.jwt_authn, envoy.filters.http.on_demand, envoy.filters.http.original_src, envoy.filters.http.rbac, envoy.filters.http.tap, envoy.grpc_http1_bridge, envoy.grpc_json_transcoder, envoy.grpc_web, envoy.gzip, envoy.health_check, envoy.http_dynamo_filter, envoy.ip_tagging, envoy.lua, envoy.rate_limit, envoy.router, envoy.squash
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.transport_sockets.upstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.transport_sockets.downstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.clusters: envoy.cluster.eds, envoy.cluster.logical_dns, envoy.cluster.original_dst, envoy.cluster.static, envoy.cluster.strict_dns, envoy.clusters.aggregate, envoy.clusters.dynamic_forward_proxy, envoy.clusters.redis
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.filters.listener: envoy.listener.http_inspector, envoy.listener.original_dst, envoy.listener.original_src, envoy.listener.proxy_protocol, envoy.listener.tls_inspector
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.health_checkers: envoy.health_checkers.redis
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.dubbo_proxy.protocols: dubbo
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.grpc_credentials: envoy.grpc_credentials.aws_iam, envoy.grpc_credentials.default, envoy.grpc_credentials.file_based_metadata
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.dubbo_proxy.filters: envoy.filters.dubbo.router
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.thrift_proxy.protocols: auto, binary, binary/non-strict, compact, twitter
[2020-07-23 13:39:01.424][3497][info][main] [external/envoy/source/server/server.cc:255] envoy.dubbo_proxy.route_matchers: default
[2020-07-23 13:39:01.482][3497][warning][misc] [external/envoy/source/common/protobuf/utility.cc:441] Using deprecated option ‘envoy.api.v2.Cluster.hosts’ from file cluster.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2020-07-23 13:39:01.482][3497][warning][misc] [external/envoy/source/common/protobuf/utility.cc:441] Using deprecated option ‘envoy.api.v2.Cluster.tls_context’ from file cluster.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2020-07-23 13:39:01.482][3497][info][main] [external/envoy/source/server/server.cc:336] admin address: 127.0.0.1:19005
[2020-07-23 13:39:01.485][3497][info][main] [external/envoy/source/server/server.cc:455] runtime: layers: - name: static_layer
static_layer:
envoy.deprecated_features:envoy.config.trace.v2.ZipkinConfig.HTTP_JSON_V1: true
envoy.deprecated_features:envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager.Tracing.operation_name: true
envoy.deprecated_features:envoy.api.v2.Cluster.tls_context: true
[2020-07-23 13:39:01.485][3497][info][config] [external/envoy/source/server/configuration_impl.cc:62] loading 0 static secret(s)
[2020-07-23 13:39:01.485][3497][info][config] [external/envoy/source/server/configuration_impl.cc:68] loading 1 cluster(s)
[2020-07-23 13:39:01.498][3497][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:167] cm init: initializing cds
[2020-07-23 13:39:01.501][3497][info][config] [external/envoy/source/server/configuration_impl.cc:72] loading 0 listener(s)
[2020-07-23 13:39:01.501][3497][info][config] [external/envoy/source/server/configuration_impl.cc:97] loading tracing configuration
[2020-07-23 13:39:01.502][3497][info][config] [external/envoy/source/server/configuration_impl.cc:116] loading stats sink configuration
[2020-07-23 13:39:01.502][3497][info][main] [external/envoy/source/server/server.cc:550] starting main dispatch loop
[2020-07-23 13:39:01.506][3497][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:74] cds: add 1 cluster(s), remove 1 cluster(s)
[2020-07-23 13:39:01.532][3497][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:90] cds: add/update cluster ‘dc2.internal.9560fc88-3471-6a9c-c64c-9174e676b207.consul’
[2020-07-23 13:39:01.532][3497][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:145] cm init: initializing secondary clusters
[2020-07-23 13:39:01.533][3497][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:171] cm init: all clusters initialized
[2020-07-23 13:39:01.533][3497][info][main] [external/envoy/source/server/server.cc:529] all clusters initialized. initializing init manager
[2020-07-23 13:39:01.536][3497][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener ‘lan:127.0.0.1:8443’
[2020-07-23 13:39:01.536][3497][warning][misc] [external/envoy/source/common/protobuf/utility.cc:441] Using deprecated option ‘envoy.api.v2.listener.Filter.config’ from file listener_components.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2020-07-23 13:39:01.537][3497][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener ‘wan:10.1.168.149:8443’
[2020-07-23 13:39:01.537][3497][info][config] [external/envoy/source/server/listener_manager_impl.cc:707] all dependencies initialized. starting workers
[2020-07-23 13:39:01.561][3497][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:74] cds: add 1 cluster(s), remove 1 cluster(s)
[2020-07-23 13:39:01.562][3497][warning][misc] [external/envoy/source/common/protobuf/utility.cc:441] Using deprecated option ‘envoy.api.v2.listener.Filter.config’ from file listener_components.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[external/envoy/source/server/drain_manager_impl.cc:68] shutting down parent after drain

Mesh gateway log in Kubernetes:

[2020-07-23 04:05:55.200][1][info][upstream] [source/common/upstream/cds_api_impl.cc:93] cds: add/update cluster ‘dc1.internal.9560fc88-3471-6a9c-c64c-9174e676b207.consul’
7/23/2020 12:05:55 AM [2020-07-23 04:05:55.307][1][warning][misc] [bazel-out/k8-opt/bin/source/extensions/common/_virtual_includes/utility_lib/extensions/common/utility.h:65] Using deprecated extension name ‘envoy.listener.tls_inspector’ for ‘envoy.filters.listener.tls_inspector’. This name will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
7/23/2020 12:05:55 AM [2020-07-23 04:05:55.400][1][info][upstream] [source/server/lds_api.cc:76] lds: add/update listener ‘default:10.42.1.112:8443’
7/23/2020 12:09:17 AM [2020-07-23 04:09:17.760][1][info][upstream] [source/common/upstream/cds_api_impl.cc:77] cds: add 4 cluster(s), remove 1 cluster(s)
7/23/2020 12:09:26 AM [2020-07-23 04:09:26.111][1][info][upstream] [source/common/upstream/cds_api_impl.cc:77] cds: add 4 cluster(s), remove 1 cluster(s)

The Helm config for Kubernetes cluster:

I only bootstrapped one server. Also disabled acl and gossip encryption.
Instead of using load balancer, I used NodePort service to expose the mesh gateway.

My vm server config:

cert_file = “/home/ubuntu/dc1-server-consul-0.pem”
key_file = “/home/ubuntu/dc1-server-consul-0-key.pem”
ca_file = “/home/ubuntu/consul-agent-ca.pem”
primary_gateways = [“10.1.168.145:32001”]
server = true
bootstrap_expect = 1
datacenter = “dc1”
data_dir = “/opt/consul”
enable_central_service_config = true
primary_datacenter = “dc2”
connect {
enabled = true
enable_mesh_gateway_wan_federation = true
}
verify_incoming_rpc = true
verify_outgoing = true
verify_server_hostname = true
ports {
https = 8501
http = 8500
grpc = 8502
}

The command I used to launch mesh gateway on vm server:

consul connect envoy -mesh-gateway -register -service “gateway-secondary” -address “127.0.0.1:8443” -wan-address “10.1.168.149:8443” -admin-bind
127.0.0.1:19005 -grpc-addr=https://127.0.0.1:8502 -ca-file=/home/ubuntu/consul-agent-ca.pem

From the vm server I was able to show the services in kubernetes cluster, however from the kuberenetes consul server when I query the services in vm cluster I got:

Error listing services: Unexpected response code: 500 (Remote DC has no server currently reachable)

I followed this guide to set up consul on vm

Can someone please help me see what’s wrong here?

1 Like

I am having similar issues. The documentation page for WAN Federation is not available anymore. Is there any guide or other documentation that can be used on how to set WAN federation up with consul 1.8 between VMs and Kubernetes clusters?

@WannaBeGeekster There are these docs: https://www.consul.io/docs/k8s/installation/multi-cluster/overview, https://www.consul.io/docs/k8s/installation/multi-cluster/vms-and-kubernetes

@YaoTu sorry you’re having these issues. We’re going to try and reproduce them and get back to you.

Hi @lkysow, thank you for your response.

Interestingly, I can reproduce this issue in WAN federation of vm clusters. I thought this might be a network issue of my environment, but somehow I was able to connect kubernetes clusters following this guide.

Just want to share how I re-created this issue in vm clusters.

I used two virtual machines whose security group allows all IPv4 and IPv6 traffic to and from each other. They’re called dc3-server and dc4-server.

Set up for dc3-server:

I followed the deployment guide to set up consul. I didn’t generate the gossip encryption key.

Here is the configuration for the server:

datacenter = “dc3”
data_dir = “/opt/consul”
ca_file = “/etc/consul.d/consul-agent-ca.pem”
cert_file = “/etc/consul.d/dc3-server-consul-0.pem”
key_file = “/etc/consul.d/dc3-server-consul-0-key.pem”
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true
server = true
primary_datacenter = “dc3”
bootstrap_expect = 1
connect {
enabled = true
enable_mesh_gateway_wan_federation = true
}
enable_central_service_config = true
ports {
grpc = 8502
https = 8501
}

For the systemd configuration I changed Type to exec as instructed by the guide.

Command to run mesh gateway On dc3-server

sudo consul connect envoy -mesh-gateway -register -service gateway-primary -address 10.1.169.25:443 -wan-address 10.1.169.25:8443 -admin-bind 127.0.0.1:19005 -grpc-addr=https://127.0.0.1:8502 -ca-file=/etc/consul.d/consul-agent-ca.pem -expose-servers

Log from consul in dc3-server:

2020-07-28T01:25:50.368Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=federation-state-list-mesh-gateways error=“error filling agent cache: No cluster leader”
2020-07-28T01:25:50.425Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“No cluster leader”
2020-07-28T01:25:50.425Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=consul-server-list error=“error filling agent cache: No cluster leader”
2020-07-28T01:25:50.432Z [ERROR] agent: error handling service update: error=“error watching service config: No cluster leader”
2020-07-28T01:25:50.449Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=roots error=“error filling agent cache: No cluster leader”
2020-07-28T01:25:50.537Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=service-list error=“error filling agent cache: No cluster leader”
2020-07-28T01:25:50.726Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=service-resolvers error=“error filling agent cache: No cluster leader”
2020-07-28T01:25:51.841Z [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2020-07-28T01:25:51.841Z [INFO] agent.server.raft: entering candidate state: node=“Node at 10.1.169.25:8300 [Candidate]” term=9
2020-07-28T01:25:51.855Z [INFO] agent.server.raft: election won: tally=1
2020-07-28T01:25:51.855Z [INFO] agent.server.raft: entering leader state: leader=“Node at 10.1.169.25:8300 [Leader]”
2020-07-28T01:25:51.856Z [INFO] agent.server: cluster leadership acquired
2020-07-28T01:25:51.856Z [INFO] agent.server: New leader elected: payload=yao-dc3-server
2020-07-28T01:25:52.204Z [INFO] agent.leader: started routine: routine=“federation state anti-entropy”
2020-07-28T01:25:52.204Z [INFO] agent.leader: started routine: routine=“federation state pruning”
2020-07-28T01:25:52.205Z [INFO] agent.leader: started routine: routine=“CA root pruning”
2020-07-28T01:25:53.061Z [INFO] agent: Synced node info
2020-07-28T01:25:57.360Z [INFO] agent.server.gateway_locator: new cached locations of mesh gateways: primary=[10.1.169.25:8443] local=[10.1.169.25:443]
2020-07-28T01:26:28.850Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: yao-dc4-server.dc4 10.1.168.162
2020-07-28T01:26:28.851Z [INFO] agent.server: Handled event for server in area: event=member-join server=yao-dc4-server.dc4 area=wan
2020-07-28T01:26:28.861Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.162:8302: EOF
2020-07-28T01:26:29.361Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.162:8302: EOF
2020-07-28T01:26:29.862Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.162:8302: EOF
2020-07-28T01:26:30.362Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.162:8302: EOF
2020-07-28T01:26:30.861Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.162:8302: EOF
2020-07-28T01:26:33.842Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ack: EOF from=10.1.169.25:43494
2020-07-28T01:26:38.362Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: EOF
2020-07-28T01:26:39.340Z [WARN] agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: yao-dc4-server.dc4)
2020-07-28T01:26:39.361Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.162:8302: EOF
2020-07-28T01:26:39.862Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.162:8302: EOF
2020-07-28T01:26:40.361Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.162:8302: EOF
2020-07-28T01:26:40.862Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to 10.1.168.162:8302: EOF
2020-07-28T01:26:43.843Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ack: EOF from=10.1.169.25:43494
2020-07-28T01:26:45.715Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:45.715Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=mesh-gateway:dc4 error=“error filling agent cache: rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:45.717Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:45.717Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=mesh-gateway:dc4 error=“error filling agent cache: rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:45.718Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:45.719Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=mesh-gateway:dc4 error=“error filling agent cache: rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:45.719Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:45.720Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=mesh-gateway:dc4 error=“error filling agent cache: rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:47.967Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:47.967Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=mesh-gateway:dc4 error=“error filling agent cache: rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:47.968Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:47.970Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:47.971Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:48.362Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: EOF
2020-07-28T01:26:49.150Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:50.796Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:54.685Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:54.686Z [ERROR] agent.proxycfg: watch error: service_id=gateway-primary id=mesh-gateway:dc4 error=“error filling agent cache: rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:54.687Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:54.688Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:54.690Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:55.472Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:57.092Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:57.270Z [ERROR] agent.server.rpc: RPC failed to server in DC: server=10.1.168.162:8300 datacenter=dc4 method=Internal.ServiceDump error=“rpc error getting client: failed to get conn: EOF”
2020-07-28T01:26:57.631Z [ERROR] agent.server.memberlist.wan: memberlist: Push/Pull with yao-dc4-server.dc4 failed: EOF
2020-07-28T01:26:58.361Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: EOF
2020-07-28T01:26:58.842Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send ack: EOF from=10.1.169.25:43494

netstat from dc3-server:

Set up for dc4-server:

Configuration:

datacenter = “dc4”
data_dir = “/opt/consul”
ca_file = “/etc/consul.d/consul-agent-ca.pem”
cert_file = “/etc/consul.d/dc4-server-consul-0.pem”
key_file = “/etc/consul.d/dc4-server-consul-0-key.pem”
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true
server = true
bootstrap_expect = 1
primary_gateways = [ “10.1.169.25:8443” ]
primary_datacenter = “dc3”
enable_central_service_config = true
connect {
enabled = true
enable_mesh_gateway_wan_federation = true
}
enable_central_service_config = true
ports {
grpc = 8502
https = 8501
}

Command to run mesh gateway on dc4-server:

sudo consul connect envoy -mesh-gateway -register -service gateway-secondary -address 10.1.168.162:443 -wan-address 10.1.168.162:8443 -admin-bind 127.0.0.1:19005 -grpc-addr=https://127.0.0.1:8502 -ca-file=/etc/consul.d/consul-agent-ca.pem

Log from consul on dc4-server:

2020-07-28T01:29:59.922Z [ERROR] agent.anti_entropy: failed to sync remote state: error=“No cluster leader”
2020-07-28T01:30:00.030Z [ERROR] agent.proxycfg: watch error: service_id=gateway-secondary id=service-list error=“error filling agent cache: No cluster leader”
2020-07-28T01:30:00.063Z [ERROR] agent: error handling service update: error=“error watching service config: No cluster leader”
2020-07-28T01:30:00.084Z [ERROR] agent.proxycfg: watch error: service_id=gateway-secondary id=roots error=“error filling agent cache: No cluster leader”
2020-07-28T01:30:00.129Z [ERROR] agent.proxycfg: watch error: service_id=gateway-secondary id=service-resolvers error=“error filling agent cache: No cluster leader”
2020-07-28T01:30:01.865Z [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2020-07-28T01:30:01.865Z [INFO] agent.server.raft: entering candidate state: node=“Node at 10.1.168.162:8300 [Candidate]” term=11
2020-07-28T01:30:01.881Z [INFO] agent.server.raft: election won: tally=1
2020-07-28T01:30:01.881Z [INFO] agent.server.raft: entering leader state: leader=“Node at 10.1.168.162:8300 [Leader]”
2020-07-28T01:30:01.882Z [INFO] agent.server: cluster leadership acquired
2020-07-28T01:30:01.882Z [INFO] agent.server: New leader elected: payload=yao-dc4-server
2020-07-28T01:30:02.098Z [INFO] agent.server.connect: initialized secondary datacenter CA with provider: provider=consul
2020-07-28T01:30:02.098Z [INFO] agent.leader: started routine: routine=“config entry replication”
2020-07-28T01:30:02.099Z [INFO] agent.leader: started routine: routine=“federation state replication”
2020-07-28T01:30:02.099Z [INFO] agent.leader: started routine: routine=“federation state anti-entropy”
2020-07-28T01:30:02.099Z [INFO] agent.leader: started routine: routine=“secondary CA roots watch”
2020-07-28T01:30:02.099Z [INFO] agent.leader: started routine: routine=“intention replication”
2020-07-28T01:30:02.099Z [INFO] agent.leader: started routine: routine=“secondary cert renew watch”
2020-07-28T01:30:02.099Z [INFO] agent.leader: started routine: routine=“CA root pruning”
2020-07-28T01:30:02.101Z [INFO] agent.server.gateway_locator: will dial the primary datacenter using our local mesh gateways if possible
2020-07-28T01:30:02.232Z [INFO] agent: Synced node info
2020-07-28T01:30:02.810Z [INFO] agent.server.memberlist.wan: memberlist: Suspect yao-dc3-server.dc3 has failed, no acks received
2020-07-28T01:30:06.809Z [INFO] agent.server.gateway_locator: new cached locations of mesh gateways: primary=[10.1.169.25:8443] local=
2020-07-28T01:30:17.810Z [INFO] agent.server.memberlist.wan: memberlist: Suspect yao-dc3-server.dc3 has failed, no acks received
2020-07-28T01:30:32.810Z [INFO] agent.server.memberlist.wan: memberlist: Marking yao-dc3-server.dc3 as failed, suspect timeout reached (0 peer confirmations)
2020-07-28T01:30:32.810Z [INFO] agent.server.serf.wan: serf: EventMemberFailed: yao-dc3-server.dc3 10.1.169.25
2020-07-28T01:30:32.810Z [INFO] agent.server: Handled event for server in area: event=member-failed server=yao-dc3-server.dc3 area=wan
2020-07-28T01:30:37.811Z [INFO] agent.server.memberlist.wan: memberlist: Suspect yao-dc3-server.dc3 has failed, no acks received
2020-07-28T01:30:52.811Z [INFO] agent.server.serf.wan: serf: attempting reconnect to yao-dc3-server.dc3 10.1.169.25:8302
2020-07-28T01:30:52.816Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: yao-dc3-server.dc3 10.1.169.25
2020-07-28T01:30:52.816Z [INFO] agent.server: Handled event for server in area: event=member-join server=yao-dc3-server.dc3 area=wan
2020-07-28T01:31:17.810Z [INFO] agent.server.memberlist.wan: memberlist: Suspect yao-dc3-server.dc3 has failed, no acks received
2020-07-28T01:31:47.809Z [INFO] agent.server.memberlist.wan: memberlist: Suspect yao-dc3-server.dc3 has failed, no acks received
2020-07-28T01:31:47.811Z [INFO] agent.server.memberlist.wan: memberlist: Marking yao-dc3-server.dc3 as failed, suspect timeout reached (0 peer confirmations)

Netstat from dc4-server:

In both servers envoy mesh gateways didn’t complain anything.

I followed the order: primary dc server -> primary mesh gateway -> secondary dc server -> secondary gateway.
Hope this information helps

For the VM-only scenario you described can you double check that the TLS SAN fields are correct? There is an additional SAN field required for servers when they are WAN federated via mesh gateways: https://www.consul.io/docs/connect/gateways/wan-federation-via-mesh-gateways#tls

Hi @rboyer, I checked the SAN fields in both servers’ TLS.

Looks like they have the SAN fields required:

dc4-server:

X509v3 Subject Alternative Name:
DNS:yao-dc4-server.server.dc4.consul, DNS:server.dc4.consul, DNS:localhost, IP Address:127.0.0.1

dc3- server:

X509v3 Subject Alternative Name:
DNS:yao-dc3-server.server.dc3.consul, DNS:server.dc3.consul, DNS:localhost, IP Address:127.0.0.1

Ok I figured out what’s wrong. I didn’t add the -expose-servers flag when I run the vm mesh gateway. As noted in this document:
https://www.consul.io/docs/connect/gateways/wan-federation-via-mesh-gateways

1 Like