Error WAN federation between GKE cluster and VM's on GCP

So I followed the Docs

And was able to set up wan federation between the clusters, but when I tried to get the VM to federate to these clusters I wasn’t able to, I think its because of the envoy proxy not starting properly

I used the official helm chart and used the following values file for the primary datacenter

 global:
      name: consul
      image: consul:1.8.0
      imageK8S: hashicorp/consul-k8s:0.16.0
      datacenter: dc1
      federation:
        enabled: true
        createFederationSecret: true
      tls:
        enabled: true
   meshGateway:
      enabled: true
   connectInject:
      enabled: true

my server config file -

cert_file = "/<location>/consul/config/vm-dc-server-consul-0.pem"

 key_file = "/<location>/consul/config/vm-dc-server-consul-0-key.pem"

 ca_file = "/<location>/consul/config/consul-agent-ca.pem"

 primary_gateways = ["<IP of mesh service>:443"]

 # Other server settings

 server = true

 datacenter = "vm"

 data_dir = "/<location>/consul/data"

 enable_central_service_config = true

 primary_datacenter = "dc1"

 connect {

 enabled = true

 enable_mesh_gateway_wan_federation = true

 }

 verify_incoming_rpc = true

 verify_outgoing = true

 verify_server_hostname = true

 ports {

 https = 8501

 http = 8500

 grpc = 8502

 }

the logs from the VM are -

2020-07-09T07:25:58.304Z [ERROR] agent.server: failed to establish leadership: error="Failed to set the intermediate certificate with the CA provider: could not verify intermediate cert against root: x509: certificate has expired or is not yet valid: current time 2020-07-09T07:25:58Z is before 2020-07-09T07:27:05Z"
    2020-07-09T07:25:58.304Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=0 retry_limit=3 error="cannot find peer"
    2020-07-09T07:25:58.304Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=1 retry_limit=3 error="cannot find peer"
    2020-07-09T07:25:58.304Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=2 retry_limit=3 error="cannot find peer"
    2020-07-09T07:25:58.304Z [ERROR] agent.server: failed to transfer leadership: error="failed to transfer leadership in 3 attempts"
    2020-07-09T07:25:58.449Z [WARN]  agent: Check socket connection failed: check=service:vm-gateway error="dial tcp 10.154.0.17:7051: connect: connection refused"
    2020-07-09T07:25:58.449Z [WARN]  agent: Check is now critical: check=service:vm-gateway
    2020-07-09T07:26:01.820Z [WARN]  agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc2 method=Internal.ServiceDump
    2020-07-09T07:26:03.343Z [ERROR] agent.server: failed to establish leadership: error="Failed to set the intermediate certificate with the CA provider: could not verify intermediate cert against root: x509: certificate has expired or is not yet valid: current time 2020-07-09T07:26:03Z is before 2020-07-09T07:27:10Z"
    2020-07-09T07:26:03.343Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=0 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:03.343Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=1 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:03.343Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=2 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:03.343Z [ERROR] agent.server: failed to transfer leadership: error="failed to transfer leadership in 3 attempts"
    2020-07-09T07:26:08.370Z [ERROR] agent.server: failed to establish leadership: error="Failed to set the intermediate certificate with the CA provider: could not verify intermediate cert against root: x509: certificate has expired or is not yet valid: current time 2020-07-09T07:26:08Z is before 2020-07-09T07:27:15Z"
    2020-07-09T07:26:08.370Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=0 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:08.370Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=1 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:08.370Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=2 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:08.370Z [ERROR] agent.server: failed to transfer leadership: error="failed to transfer leadership in 3 attempts"
    2020-07-09T07:26:08.450Z [WARN]  agent: Check socket connection failed: check=service:vm-gateway error="dial tcp 10.154.0.17:7051: connect: connection refused"
    2020-07-09T07:26:08.450Z [WARN]  agent: Check is now critical: check=service:vm-gateway
    2020-07-09T07:26:13.397Z [ERROR] agent.server: failed to establish leadership: error="Failed to set the intermediate certificate with the CA provider: could not verify intermediate cert against root: x509: certificate has expired or is not yet valid: current time 2020-07-09T07:26:13Z is before 2020-07-09T07:27:20Z"
    2020-07-09T07:26:13.397Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=0 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:13.397Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=1 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:13.397Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=2 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:13.397Z [ERROR] agent.server: failed to transfer leadership: error="failed to transfer leadership in 3 attempts"
    2020-07-09T07:26:16.740Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-1.dc1 has failed, no acks received
    2020-07-09T07:26:17.731Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:51388": tls: first record does not look like a TLS handshake
    2020-07-09T07:26:18.428Z [ERROR] agent.server: failed to establish leadership: error="Failed to set the intermediate certificate with the CA provider: could not verify intermediate cert against root: x509: certificate has expired or is not yet valid: current time 2020-07-09T07:26:18Z is before 2020-07-09T07:27:25Z"
    2020-07-09T07:26:18.428Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=0 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:18.428Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=1 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:18.428Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=2 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:18.428Z [ERROR] agent.server: failed to transfer leadership: error="failed to transfer leadership in 3 attempts"
    2020-07-09T07:26:18.450Z [WARN]  agent: Check socket connection failed: check=service:vm-gateway error="dial tcp 10.154.0.17:7051: connect: connection refused"
    2020-07-09T07:26:18.450Z [WARN]  agent: Check is now critical: check=service:vm-gateway
    2020-07-09T07:26:18.976Z [WARN]  agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc2 method=Internal.ServiceDump
    2020-07-09T07:26:19.395Z [WARN]  agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc2 method=Internal.ServiceDump
    2020-07-09T07:26:21.925Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:51392": tls: first record does not look like a TLS handshake
    2020-07-09T07:26:22.052Z [WARN]  agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dc2 method=Internal.ServiceDump
    2020-07-09T07:26:23.455Z [ERROR] agent.server: failed to establish leadership: error="Failed to set the intermediate certificate with the CA provider: could not verify intermediate cert against root: x509: certificate has expired or is not yet valid: current time 2020-07-09T07:26:23Z is before 2020-07-09T07:27:30Z"
    2020-07-09T07:26:23.455Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=0 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:23.455Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=1 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:23.455Z [ERROR] agent.server: failed to transfer leadership attempt, will retry: attempt=2 retry_limit=3 error="cannot find peer"
    2020-07-09T07:26:23.455Z [ERROR] agent.server: failed to transfer leadership: error="failed to transfer leadership in 3 attempts"

I have checked the ports they are all available using netstat and the firewall configuration allows all the consul ports

envoy logs -

==> Registered service: vm-gateway
[2020-07-09 07:28:25.174][4737][info][main] [external/envoy/source/server/server.cc:255] initializing epoch 0 (hot restart version=disabled)
[2020-07-09 07:28:25.174][4737][info][main] [external/envoy/source/server/server.cc:257] statically linked extensions:
[2020-07-09 07:28:25.174][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.filters.http: envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.aws_lambda, envoy.filters.http.aws_request_signing, envoy.filters.http.buffer, envoy.filters.http.cache, envoy.filters.http.cors, envoy.filters.http.csrf, envoy.filters.http.dynamic_forward_proxy, envoy.filters.http.dynamo, envoy.filters.http.ext_authz, envoy.filters.http.fault, envoy.filters.http.grpc_http1_bridge, envoy.filters.http.grpc_http1_reverse_bridge, envoy.filters.http.grpc_json_transcoder, envoy.filters.http.grpc_stats, envoy.filters.http.grpc_web, envoy.filters.http.gzip, envoy.filters.http.header_to_metadata, envoy.filters.http.health_check, envoy.filters.http.ip_tagging, envoy.filters.http.jwt_authn, envoy.filters.http.lua, envoy.filters.http.on_demand, envoy.filters.http.original_src, envoy.filters.http.ratelimit, envoy.filters.http.rbac, envoy.filters.http.router, envoy.filters.http.squash, envoy.filters.http.tap, envoy.grpc_http1_bridge, envoy.grpc_json_transcoder, envoy.grpc_web, envoy.gzip, envoy.health_check, envoy.http_dynamo_filter, envoy.ip_tagging, envoy.lua, envoy.rate_limit, envoy.router, envoy.squash
[2020-07-09 07:28:25.174][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.clusters: envoy.cluster.eds, envoy.cluster.logical_dns, envoy.cluster.original_dst, envoy.cluster.static, envoy.cluster.strict_dns, envoy.clusters.aggregate, envoy.clusters.dynamic_forward_proxy, envoy.clusters.redis
[2020-07-09 07:28:25.174][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.dubbo_proxy.protocols: dubbo
[2020-07-09 07:28:25.174][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.filters.network: envoy.client_ssl_auth, envoy.echo, envoy.ext_authz, envoy.filters.network.client_ssl_auth, envoy.filters.network.direct_response, envoy.filters.network.dubbo_proxy, envoy.filters.network.echo, envoy.filters.network.ext_authz, envoy.filters.network.http_connection_manager, envoy.filters.network.kafka_broker, envoy.filters.network.local_ratelimit, envoy.filters.network.mongo_proxy, envoy.filters.network.mysql_proxy, envoy.filters.network.ratelimit, envoy.filters.network.rbac, envoy.filters.network.redis_proxy, envoy.filters.network.sni_cluster, envoy.filters.network.tcp_proxy, envoy.filters.network.thrift_proxy, envoy.filters.network.zookeeper_proxy, envoy.http_connection_manager, envoy.mongo_proxy, envoy.ratelimit, envoy.redis_proxy, envoy.tcp_proxy
[2020-07-09 07:28:25.174][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.thrift_proxy.transports: auto, framed, header, unframed
[2020-07-09 07:28:25.174][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.thrift_proxy.filters: envoy.filters.thrift.rate_limit, envoy.filters.thrift.router
[2020-07-09 07:28:25.174][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.dubbo_proxy.route_matchers: default
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.retry_host_predicates: envoy.retry_host_predicates.omit_canary_hosts, envoy.retry_host_predicates.omit_host_metadata, envoy.retry_host_predicates.previous_hosts
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.dubbo_proxy.filters: envoy.filters.dubbo.router
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.udp_listeners: raw_udp_listener
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.dubbo_proxy.serializers: dubbo.hessian2
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.health_checkers: envoy.health_checkers.redis
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.stats_sinks: envoy.dog_statsd, envoy.metrics_service, envoy.stat_sinks.dog_statsd, envoy.stat_sinks.hystrix, envoy.stat_sinks.metrics_service, envoy.stat_sinks.statsd, envoy.statsd
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.resolvers: envoy.ip
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.resource_monitors: envoy.resource_monitors.fixed_heap, envoy.resource_monitors.injected_resource
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   http_cache_factory: envoy.extensions.http.cache.simple
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.filters.udp_listener: envoy.filters.udp.dns_filter, envoy.filters.udp_listener.udp_proxy
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.thrift_proxy.protocols: auto, binary, binary/non-strict, compact, twitter
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.grpc_credentials: envoy.grpc_credentials.aws_iam, envoy.grpc_credentials.default, envoy.grpc_credentials.file_based_metadata
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.tracers: envoy.dynamic.ot, envoy.lightstep, envoy.tracers.datadog, envoy.tracers.dynamic_ot, envoy.tracers.lightstep, envoy.tracers.opencensus, envoy.tracers.xray, envoy.tracers.zipkin, envoy.zipkin
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.transport_sockets.downstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.retry_priorities: envoy.retry_priorities.previous_priorities
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.access_loggers: envoy.access_loggers.file, envoy.access_loggers.http_grpc, envoy.access_loggers.tcp_grpc, envoy.file_access_log, envoy.http_grpc_access_log, envoy.tcp_grpc_access_log
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.transport_sockets.upstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls
[2020-07-09 07:28:25.175][4737][info][main] [external/envoy/source/server/server.cc:259]   envoy.filters.listener: envoy.filters.listener.http_inspector, envoy.filters.listener.original_dst, envoy.filters.listener.original_src, envoy.filters.listener.proxy_protocol, envoy.filters.listener.tls_inspector, envoy.listener.http_inspector, envoy.listener.original_dst, envoy.listener.original_src, envoy.listener.proxy_protocol, envoy.listener.tls_inspector
[2020-07-09 07:28:25.196][4737][warning][misc] [external/envoy/source/common/protobuf/utility.cc:198] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cluster.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2020-07-09 07:28:25.197][4737][info][main] [external/envoy/source/server/server.cc:340] admin address: 127.0.0.1:19005
[2020-07-09 07:28:25.198][4737][info][main] [external/envoy/source/server/server.cc:459] runtime: layers:
  - name: static_layer
    static_layer:
      envoy.deprecated_features:envoy.config.trace.v2.ZipkinConfig.HTTP_JSON_V1: true
      envoy.deprecated_features:envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager.Tracing.operation_name: true
      envoy.deprecated_features:envoy.api.v2.Cluster.tls_context: true
[2020-07-09 07:28:25.198][4737][info][config] [external/envoy/source/server/configuration_impl.cc:103] loading tracing configuration
[2020-07-09 07:28:25.198][4737][info][config] [external/envoy/source/server/configuration_impl.cc:69] loading 0 static secret(s)
[2020-07-09 07:28:25.198][4737][info][config] [external/envoy/source/server/configuration_impl.cc:75] loading 1 cluster(s)
[2020-07-09 07:28:25.206][4737][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:167] cm init: initializing cds
[2020-07-09 07:28:25.208][4737][info][config] [external/envoy/source/server/configuration_impl.cc:79] loading 0 listener(s)
[2020-07-09 07:28:25.208][4737][info][config] [external/envoy/source/server/configuration_impl.cc:129] loading stats sink configuration
[2020-07-09 07:28:25.209][4737][info][main] [external/envoy/source/server/server.cc:554] starting main dispatch loop
[2020-07-09 07:28:25.210][4737][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
[2020-07-09 07:28:25.514][4737][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
[2020-07-09 07:28:26.474][4737][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination

I have used the command

consul connect envoy -mesh-gateway -register \
                     -service "secondary-primary" \
                     -address "<your private address>:<port>" \
                     -wan-address "<your externally accessible address>:<port>"\
                     -admin-bind 127.0.0.1:19005 

with the above command only 19005 port is available the other ports are not available on netstat, probing into the admin url's /ready endpoint returns LIVE

consul server on k8’s log -

2020-07-09T07:31:59.081Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm
    2020-07-09T07:32:01.050Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm
    2020-07-09T07:32:03.018Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send gossip to <vm-external-ip>:8302: read tcp 10.32.1.8:38300->10.32.2.14:8443: read: connection reset by peer
    2020-07-09T07:32:03.514Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect simba.vm has failed, no acks received
    2020-07-09T07:32:03.515Z [ERROR] agent.server.memberlist.wan: memberlist: Failed to send compound ping and suspect message to <vm-external-ip>:8302: read tcp 10.32.1.8:46742->10.32.1.7:8443: read: connection reset by peer

firewall allows port 8302 tcp and udp for the VM.
on k8’s when I execute

kubectl exec consul-server-1 -- consul catalog services -datacenter vm

I get the following error

Error listing services: Unexpected response code: 500 (Remote DC has no server currently reachable)
command terminated with exit code 1

But I’m able to setup consul connect proxy to a service on kubernetes and connect to it.

Please can anyone let me know what mistakes I have made?

Hi, I think we need to focus on this error:

Failed to set the intermediate certificate with the CA provider: could not verify intermediate cert against root: x509: certificate has expired or is not yet valid: current time 2020-07-09T07:25:58Z is before 2020-07-09T07:27:05Z

Is there a chance that the time on your VM is not correct?

Hi,
To check if the they have different times I ran

date && kubectl exec consul-server-0 -- date

the ouput after running three seperate times

Mon Jul 13 05:32:16 UTC 2020
Mon Jul 13 05:32:38 UTC 2020

Mon Jul 13 05:32:39 UTC 2020
Mon Jul 13 05:33:02 UTC 2020

Mon Jul 13 05:33:21 UTC 2020
Mon Jul 13 05:33:43 UTC 2020

the outputs were instantanious and the gap between the responses were few 10’ms of seconds between them.

and the ping from the consul server pod to the vm

  • kubectl exec consul-server-0 -- ping < external ip>
64 bytes from 34.105.243.87: icmp_seq=1 ttl=62 time=7.21 ms
64 bytes from 34.105.243.87: icmp_seq=2 ttl=62 time=7.38 ms
64 bytes from 34.105.243.87: icmp_seq=3 ttl=62 time=7.21 ms
64 bytes from 34.105.243.87: icmp_seq=4 ttl=62 time=7.27 ms
64 bytes from 34.105.243.87: icmp_seq=5 ttl=62 time=7.13 ms

To my server config when i tried to add the bind_address as the external IP of the VM the consul agents did not start, hence i have used -advertise-wan and set translate_wan_addrs to true, could this have caused a problem?

shifted to a VM with no drift

Mon Jul 13 07:58:34 UTC 2020
Mon Jul 13 07:58:38 UTC 2020

Mon Jul 13 07:58:39 UTC 2020
Mon Jul 13 07:58:40 UTC 2020

Mon Jul 13 07:58:43 UTC 2020
Mon Jul 13 07:58:44 UTC 2020

still not working logs from that vm -

BootstrapExpect is set to 1; this is the same as Bootstrap mode.
bootstrap = true: do not enable unless necessary
==> Starting Consul agent...
           Version: 'v1.8.0'
           Node ID: 'c302e62e-56fb-cdd4-3123-1a4fc6b96f46'
         Node name: 'ansible-orderer'
        Datacenter: 'vm-dc' (Segment: '<all>')
            Server: true (Bootstrap: true)
       Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: 8501, gRPC: 8502, DNS: 8600)
      Cluster Addr: 10.128.0.42 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: true, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2020-07-13T07:49:15.537Z [INFO]  agent.server.gateway_locator: will dial the primary datacenter through its mesh gateways
    2020-07-13T07:49:15.604Z [INFO]  agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:c302e62e-56fb-cdd4-3123-1a4fc6b96f46 Address:10.128.0.42:8300}]"
    2020-07-13T07:49:15.605Z [WARN]  agent.server.memberlist.wan: memberlist: Binding to public address without encryption!
    2020-07-13T07:49:15.605Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: ansible-orderer.vm-dc 34.71.55.109
    2020-07-13T07:49:15.606Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: ansible-orderer 10.128.0.42
    2020-07-13T07:49:15.607Z [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
    2020-07-13T07:49:15.608Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.128.0.42:8300 [Follower]" leader=
    2020-07-13T07:49:15.610Z [INFO]  agent.server: Adding LAN server: server="ansible-orderer (Addr: tcp/10.128.0.42:8300) (DC: vm-dc)"
    2020-07-13T07:49:15.610Z [INFO]  agent.server: Handled event for server in area: event=member-join server=ansible-orderer.vm-dc area=wan
    2020-07-13T07:49:15.611Z [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
    2020-07-13T07:49:15.612Z [INFO]  agent: Started HTTP server: address=127.0.0.1:8500 network=tcp
    2020-07-13T07:49:15.612Z [INFO]  agent: Started HTTPS server: address=127.0.0.1:8501 network=tcp
    2020-07-13T07:49:15.612Z [INFO]  agent: started state syncer
==> Consul agent running!
    2020-07-13T07:49:15.613Z [INFO]  agent: Started gRPC server: address=127.0.0.1:8502 network=tcp
    2020-07-13T07:49:15.613Z [INFO]  agent: Refreshing mesh gateways is supported for the following discovery methods: discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-07-13T07:49:15.613Z [INFO]  agent: Refreshing mesh gateways...
    2020-07-13T07:49:15.613Z [INFO]  agent.server.gateway_locator: updated fallback list of primary mesh gateways: mesh_gateways=[<gateway ip>:443]
    2020-07-13T07:49:15.613Z [INFO]  agent: Refreshing mesh gateways completed
    2020-07-13T07:49:15.613Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=WAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-07-13T07:49:15.613Z [INFO]  agent: Joining cluster...: cluster=WAN
    2020-07-13T07:49:15.613Z [INFO]  agent: (WAN) joining: wan_addresses=[*.dc1/192.0.2.2]
    2020-07-13T07:49:15.926Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.dc1 10.32.1.8
    2020-07-13T07:49:15.927Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-2.dc1 10.32.0.13
    2020-07-13T07:49:15.927Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-1.dc1 10.32.2.17
    2020-07-13T07:49:15.927Z [INFO]  agent: (WAN) joined: number_of_nodes=1
    2020-07-13T07:49:15.927Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=WAN num_agents=1
    2020-07-13T07:49:15.927Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-0.dc1 area=wan
    2020-07-13T07:49:15.927Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-2.dc1 area=wan
    2020-07-13T07:49:15.927Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-1.dc1 area=wan
    2020-07-13T07:49:21.250Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
    2020-07-13T07:49:21.250Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.128.0.42:8300 [Candidate]" term=2
    2020-07-13T07:49:21.257Z [INFO]  agent.server.raft: election won: tally=1
    2020-07-13T07:49:21.257Z [INFO]  agent.server.raft: entering leader state: leader="Node at 10.128.0.42:8300 [Leader]"
    2020-07-13T07:49:21.257Z [INFO]  agent.server: cluster leadership acquired
    2020-07-13T07:49:21.257Z [INFO]  agent.server: New leader elected: payload=ansible-orderer
    2020-07-13T07:49:21.525Z [INFO]  agent: Synced node info
    2020-07-13T07:49:21.699Z [INFO]  agent.server.connect: received new intermediate certificate from primary datacenter
    2020-07-13T07:49:21.704Z [INFO]  agent.server.connect: updated root certificates from primary datacenter
    2020-07-13T07:49:21.704Z [INFO]  agent.server.connect: initialized secondary datacenter CA with provider: provider=consul
    2020-07-13T07:49:21.705Z [INFO]  agent.leader: started routine: routine="config entry replication"
    2020-07-13T07:49:21.705Z [INFO]  agent.leader: started routine: routine="federation state replication"
    2020-07-13T07:49:21.705Z [INFO]  agent.leader: started routine: routine="federation state anti-entropy"
    2020-07-13T07:49:21.705Z [INFO]  agent.leader: started routine: routine="secondary CA roots watch"
    2020-07-13T07:49:21.705Z [INFO]  agent.leader: started routine: routine="intention replication"
    2020-07-13T07:49:21.705Z [INFO]  agent.leader: started routine: routine="secondary cert renew watch"
    2020-07-13T07:49:21.705Z [INFO]  agent.leader: started routine: routine="CA root pruning"
    2020-07-13T07:49:21.705Z [INFO]  agent.server: member joined, marking health alive: member=ansible-orderer
    2020-07-13T07:49:21.807Z [INFO]  agent.server.gateway_locator: will dial the primary datacenter using our local mesh gateways if possible
    2020-07-13T07:49:21.815Z [INFO]  agent.server.gateway_locator: new cached locations of mesh gateways: primary=[35.205.133.203:443, 35.205.133.203:443] local=[]
    2020-07-13T07:49:21.912Z [INFO]  agent.server: federation state anti-entropy synced
    2020-07-13T07:49:22.018Z [INFO]  agent.server: federation state anti-entropy synced
    2020-07-13T07:49:25.605Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-0.dc1 has failed, no acks received
    2020-07-13T07:49:40.605Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-1.dc1 has failed, no acks received
    2020-07-13T07:49:48.962Z [INFO]  agent: Synced service: service=vm-gateway
    2020-07-13T07:49:48.975Z [INFO]  agent: Synced service: service=vm-gateway
    2020-07-13T07:49:49.072Z [INFO]  agent.server: federation state anti-entropy synced
    2020-07-13T07:49:49.089Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:44620": tls: first record does not look like a TLS handshake
    2020-07-13T07:49:49.396Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:44622": tls: first record does not look like a TLS handshake
    2020-07-13T07:49:50.227Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:44624": tls: first record does not look like a TLS handshake
    2020-07-13T07:49:52.127Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:44626": tls: first record does not look like a TLS handshake
    2020-07-13T07:49:54.773Z [WARN]  agent: Check socket connection failed: check=service:vm-gateway error="dial tcp 10.128.0.42:7051: connect: connection refused"
    2020-07-13T07:49:54.774Z [WARN]  agent: Check is now critical: check=service:vm-gateway
    2020-07-13T07:49:55.605Z [INFO]  agent.server.memberlist.wan: memberlist: Marking consul-server-0.dc1 as failed, suspect timeout reached (0 peer confirmations)
    2020-07-13T07:49:55.606Z [INFO]  agent.server.serf.wan: serf: EventMemberFailed: consul-server-0.dc1 10.32.1.8
    2020-07-13T07:49:55.606Z [INFO]  agent.server: Handled event for server in area: event=member-failed server=consul-server-0.dc1 area=wan
    2020-07-13T07:49:55.765Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:44630": tls: first record does not look like a TLS handshake
    2020-07-13T07:50:00.606Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-2.dc1 has failed, no acks received
    2020-07-13T07:50:03.228Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:44632": tls: first record does not look like a TLS handshake
    2020-07-13T07:50:04.774Z [WARN]  agent: Check socket connection failed: check=service:vm-gateway error="dial tcp 10.128.0.42:7051: connect: connection refused"
    2020-07-13T07:50:04.774Z [WARN]  agent: Check is now critical: check=service:vm-gateway
    2020-07-13T07:50:10.042Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:44636": tls: first record does not look like a TLS handshake
    2020-07-13T07:50:10.606Z [INFO]  agent.server.memberlist.wan: memberlist: Marking consul-server-1.dc1 as failed, suspect timeout reached (0 peer confirmations)
    2020-07-13T07:50:10.606Z [INFO]  agent.server.serf.wan: serf: EventMemberFailed: consul-server-1.dc1 10.32.2.17
    2020-07-13T07:50:10.606Z [INFO]  agent.server: Handled event for server in area: event=member-failed server=consul-server-1.dc1 area=wan
    2020-07-13T07:50:14.774Z [WARN]  agent: Check socket connection failed: check=service:vm-gateway error="dial tcp 10.128.0.42:7051: connect: connection refused"
    2020-07-13T07:50:14.774Z [WARN]  agent: Check is now critical: check=service:vm-gateway
    2020-07-13T07:50:15.609Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-0.dc1 10.32.1.8:8302
    2020-07-13T07:50:15.913Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-1.dc1 10.32.2.17
    2020-07-13T07:50:15.913Z [WARN]  agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: ansible-orderer.vm-dc)
    2020-07-13T07:50:15.913Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.dc1 10.32.1.8
    2020-07-13T07:50:15.913Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-1.dc1 area=wan
    2020-07-13T07:50:15.913Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-0.dc1 area=wan
    2020-07-13T07:50:21.804Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:44642": tls: first record does not look like a TLS handshake
    2020-07-13T07:50:24.775Z [WARN]  agent: Check socket connection failed: check=service:vm-gateway error="dial tcp 10.128.0.42:7051: connect: connection refused"
    2020-07-13T07:50:24.775Z [WARN]  agent: Check is now critical: check=service:vm-gateway
    2020-07-13T07:50:25.606Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-1.dc1 has failed, no acks received
    2020-07-13T07:50:26.719Z [WARN]  agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: ansible-orderer.vm-dc)
    2020-07-13T07:50:34.775Z [WARN]  agent: Check socket connection failed: check=service:vm-gateway error="dial tcp 10.128.0.42:7051: connect: connection refused"
    2020-07-13T07:50:34.775Z [WARN]  agent: Check is now critical: check=service:vm-gateway
    2020-07-13T07:50:44.776Z [WARN]  agent: Check socket connection failed: check=service:vm-gateway error="dial tcp 10.128.0.42:7051: connect: connection refused"
    2020-07-13T07:50:44.776Z [WARN]  agent: Check is now critical: check=service:vm-gateway

the problem seems to arrise when i start the envoy proxy
version - envoy version: 8fed4856a7cfe79cf60aa3682eff3ae55b231e49/1.14.3/clean-getenvoy-2aa564b-envoy/RELEASE/BoringSSL

the logs of starting envoy proxy are still the same

==> Registered service: vm-gateway
[2020-07-13 07:49:49.047][5859][info][main] [external/envoy/source/server/server.cc:256] initializing epoch 0 (hot restart version=disabled)
[2020-07-13 07:49:49.047][5859][info][main] [external/envoy/source/server/server.cc:258] statically linked extensions:
[2020-07-13 07:49:49.047][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.health_checkers: envoy.health_checkers.redis
[2020-07-13 07:49:49.047][5859][info][main] [external/envoy/source/server/server.cc:260]   http_cache_factory: envoy.extensions.http.cache.simple
[2020-07-13 07:49:49.047][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.dubbo_proxy.serializers: dubbo.hessian2
[2020-07-13 07:49:49.047][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.thrift_proxy.protocols: auto, binary, binary/non-strict, compact, twitter
[2020-07-13 07:49:49.047][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.dubbo_proxy.protocols: dubbo
[2020-07-13 07:49:49.047][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.clusters: envoy.cluster.eds, envoy.cluster.logical_dns, envoy.cluster.original_dst, envoy.cluster.static, envoy.cluster.strict_dns, envoy.clusters.aggregate, envoy.clusters.dynamic_forward_proxy, envoy.clusters.redis
[2020-07-13 07:49:49.048][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.filters.http: envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.aws_lambda, envoy.filters.http.aws_request_signing, envoy.filters.http.buffer, envoy.filters.http.cache, envoy.filters.http.cors, envoy.filters.http.csrf, envoy.filters.http.dynamic_forward_proxy, envoy.filters.http.dynamo, envoy.filters.http.ext_authz, envoy.filters.http.fault, envoy.filters.http.grpc_http1_bridge, envoy.filters.http.grpc_http1_reverse_bridge, envoy.filters.http.grpc_json_transcoder, envoy.filters.http.grpc_stats, envoy.filters.http.grpc_web, envoy.filters.http.gzip, envoy.filters.http.header_to_metadata, envoy.filters.http.health_check, envoy.filters.http.ip_tagging, envoy.filters.http.jwt_authn, envoy.filters.http.lua, envoy.filters.http.on_demand, envoy.filters.http.original_src, envoy.filters.http.ratelimit, envoy.filters.http.rbac, envoy.filters.http.router, envoy.filters.http.squash, envoy.filters.http.tap, envoy.grpc_http1_bridge, envoy.grpc_json_transcoder, envoy.grpc_web, envoy.gzip, envoy.health_check, envoy.http_dynamo_filter, envoy.ip_tagging, envoy.lua, envoy.rate_limit, envoy.router, envoy.squash
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.filters.listener: envoy.filters.listener.http_inspector, envoy.filters.listener.original_dst, envoy.filters.listener.original_src, envoy.filters.listener.proxy_protocol, envoy.filters.listener.tls_inspector, envoy.listener.http_inspector, envoy.listener.original_dst, envoy.listener.original_src, envoy.listener.proxy_protocol, envoy.listener.tls_inspector
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.transport_sockets.upstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.dubbo_proxy.filters: envoy.filters.dubbo.router
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.grpc_credentials: envoy.grpc_credentials.aws_iam, envoy.grpc_credentials.default, envoy.grpc_credentials.file_based_metadata
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.filters.udp_listener: envoy.filters.udp.dns_filter, envoy.filters.udp_listener.udp_proxy
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.access_loggers: envoy.access_loggers.file, envoy.access_loggers.http_grpc, envoy.access_loggers.tcp_grpc, envoy.file_access_log, envoy.http_grpc_access_log, envoy.tcp_grpc_access_log
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.stats_sinks: envoy.dog_statsd, envoy.metrics_service, envoy.stat_sinks.dog_statsd, envoy.stat_sinks.hystrix, envoy.stat_sinks.metrics_service, envoy.stat_sinks.statsd, envoy.statsd
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.thrift_proxy.transports: auto, framed, header, unframed
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.tracers: envoy.dynamic.ot, envoy.lightstep, envoy.tracers.datadog, envoy.tracers.dynamic_ot, envoy.tracers.lightstep, envoy.tracers.opencensus, envoy.tracers.xray, envoy.tracers.zipkin, envoy.zipkin
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.transport_sockets.downstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.dubbo_proxy.route_matchers: default
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.retry_priorities: envoy.retry_priorities.previous_priorities
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.udp_listeners: raw_udp_listener
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.resolvers: envoy.ip
[2020-07-13 07:49:49.052][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.resource_monitors: envoy.resource_monitors.fixed_heap, envoy.resource_monitors.injected_resource
[2020-07-13 07:49:49.053][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.filters.network: envoy.client_ssl_auth, envoy.echo, envoy.ext_authz, envoy.filters.network.client_ssl_auth, envoy.filters.network.direct_response, envoy.filters.network.dubbo_proxy, envoy.filters.network.echo, envoy.filters.network.ext_authz, envoy.filters.network.http_connection_manager, envoy.filters.network.kafka_broker, envoy.filters.network.local_ratelimit, envoy.filters.network.mongo_proxy, envoy.filters.network.mysql_proxy, envoy.filters.network.ratelimit, envoy.filters.network.rbac, envoy.filters.network.redis_proxy, envoy.filters.network.sni_cluster, envoy.filters.network.tcp_proxy, envoy.filters.network.thrift_proxy, envoy.filters.network.zookeeper_proxy, envoy.http_connection_manager, envoy.mongo_proxy, envoy.ratelimit, envoy.redis_proxy, envoy.tcp_proxy
[2020-07-13 07:49:49.053][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.retry_host_predicates: envoy.retry_host_predicates.omit_canary_hosts, envoy.retry_host_predicates.omit_host_metadata, envoy.retry_host_predicates.previous_hosts
[2020-07-13 07:49:49.053][5859][info][main] [external/envoy/source/server/server.cc:260]   envoy.thrift_proxy.filters: envoy.filters.thrift.rate_limit, envoy.filters.thrift.router
[2020-07-13 07:49:49.068][5859][warning][misc] [external/envoy/source/common/protobuf/utility.cc:198] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cluster.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2020-07-13 07:49:49.069][5859][info][main] [external/envoy/source/server/server.cc:341] admin address: 127.0.0.1:19005
[2020-07-13 07:49:49.070][5859][info][main] [external/envoy/source/server/server.cc:469] runtime: layers:
  - name: static_layer
    static_layer:
      envoy.deprecated_features:envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager.Tracing.operation_name: true
      envoy.deprecated_features:envoy.api.v2.Cluster.tls_context: true
      envoy.deprecated_features:envoy.config.trace.v2.ZipkinConfig.HTTP_JSON_V1: true
[2020-07-13 07:49:49.070][5859][info][config] [external/envoy/source/server/configuration_impl.cc:103] loading tracing configuration
[2020-07-13 07:49:49.070][5859][info][config] [external/envoy/source/server/configuration_impl.cc:69] loading 0 static secret(s)
[2020-07-13 07:49:49.070][5859][info][config] [external/envoy/source/server/configuration_impl.cc:75] loading 1 cluster(s)
[2020-07-13 07:49:49.083][5859][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:167] cm init: initializing cds
[2020-07-13 07:49:49.087][5859][info][config] [external/envoy/source/server/configuration_impl.cc:79] loading 0 listener(s)
[2020-07-13 07:49:49.088][5859][info][config] [external/envoy/source/server/configuration_impl.cc:129] loading stats sink configuration
[2020-07-13 07:49:49.088][5859][warning][main] [external/envoy/source/server/server.cc:451] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
[2020-07-13 07:49:49.088][5859][info][main] [external/envoy/source/server/server.cc:564] starting main dispatch loop
[2020-07-13 07:49:49.090][5859][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
[2020-07-13 07:49:49.396][5859][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination

is acl required for federation to take place?

Hi Sachin,
ACLs are not required, although TLS is. It looks like the envoy proxy can’t connect to the local consul agent (or server in your case I believe?).

StreamAggregatedResources gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination

In turn it looks like the initial registration of the mesh gateway succeeded:

    2020-07-13T07:49:48.975Z [INFO]  agent: Synced service: service=vm-gateway

But then when the envoy proxy makes its gRPC connection to the consul agent that is getting refused:

    2020-07-13T07:49:49.089Z [WARN]  agent: grpc: Server.Serve failed to complete security handshake from "127.0.0.1:44620": tls: first record does not look like a TLS handshake

I believe this is because when Consul is running with TLS enabled, the envoy proxies must use TLS gRPC. What this means is that you need to start the envoy proxies with these environment variables set:

CONSUL_GRPC_ADDR="https://<agent IP>:8502"
CONSUL_CACERT=/path/to/your/consul/ca/cert.pem

(note the https)

Hi lkysow,

So I corrected my mistake so I created the certificates for the gateway and ran the following command

CONSUL_HTTP_SSL=true CONSUL_HTTP_ADDR="127.0.0.1:8501" \
 consul connect envoy -ca-file=$PWD/consul-agent-ca.pem \
 -client-cert=$PWD/dc1-client-consul-0.pem -client-key=$PWD/dc1-client-consul-0-key.pem \
 -service "secondary-primary" -mesh-gateway -register -expose-servers  \
 -address "10.128.0.42:7051"  -wan-address "34.71.55.109:7051" -admin-bind 127.0.0.1:19005

I followed consul docs on mesh-gateway for exposing servers and added -expose-servers and the connection between the servers still is not proper

But the problem still persists in one place when I try to list the services of the other datacenter from kubernetes

kubectl exec consul-server-0 -- consul catalog services -datacenter vm-dc
Error listing services: Unexpected response code: 500 (Remote DC has no server currently reachable)
command terminated with exit code 1

but works from the VM

CONSUL_CACERT=$PWD/consul-agent-ca.pem CONSUL_HTTP_SSL=true \
 CONSUL_HTTP_ADDR="127.0.0.1:8501" consul catalog services -datacenter dc1
consul
mesh-gateway

VM logs -

 2020-07-21T06:35:24.928Z [INFO]  agent.server.gateway_locator: new cached locations of mesh gateways: primary=[<mesh-gateway-ip>:443, <mesh-gateway-ip>:443] local=[10.128.0.42:7051]
    2020-07-21T06:35:53.694Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-1.dc1 has failed, no acks received
    2020-07-21T06:36:19.544Z [ERROR] agent.server.memberlist.wan: memberlist: Push/Pull with consul-server-2.dc1 failed: read tcp 10.128.0.42:49690->10.128.0.42:7051: read: connection reset by peer
    2020-07-21T06:36:23.695Z [INFO]  agent.server.memberlist.wan: memberlist: Marking consul-server-1.dc1 as failed, suspect timeout reached (0 peer confirmations)
    2020-07-21T06:36:23.695Z [INFO]  agent.server.serf.wan: serf: EventMemberFailed: consul-server-1.dc1 10.4.4.17
    2020-07-21T06:36:23.695Z [INFO]  agent.server: Handled event for server in area: event=member-failed server=consul-server-1.dc1 area=wan
    2020-07-21T06:36:33.695Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-2.dc1 has failed, no acks received
    2020-07-21T06:36:49.954Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-1.dc1 10.4.4.17:8302
    2020-07-21T06:37:03.695Z [INFO]  agent.server.memberlist.wan: memberlist: Marking consul-server-2.dc1 as failed, suspect timeout reached (0 peer confirmations)
    2020-07-21T06:37:03.695Z [INFO]  agent.server.serf.wan: serf: EventMemberFailed: consul-server-2.dc1 10.4.1.115
    2020-07-21T06:37:03.695Z [INFO]  agent.server: Handled event for server in area: event=member-failed server=consul-server-2.dc1 area=wan
    2020-07-21T06:37:13.695Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-0.dc1 has failed, no acks received
    2020-07-21T06:37:19.955Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-2.dc1 10.4.1.115:8302
    2020-07-21T06:37:43.695Z [INFO]  agent.server.memberlist.wan: memberlist: Marking consul-server-0.dc1 as failed, suspect timeout reached (0 peer confirmations)
    2020-07-21T06:37:43.695Z [INFO]  agent.server.serf.wan: serf: EventMemberFailed: consul-server-0.dc1 10.4.3.14
    2020-07-21T06:37:43.695Z [INFO]  agent.server: Handled event for server in area: event=member-failed server=consul-server-0.dc1 area=wan
    2020-07-21T06:37:49.956Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-2.dc1 10.4.1.115:8302
    2020-07-21T06:37:53.695Z [INFO]  agent.server.memberlist.wan: memberlist: Suspect consul-server-0.dc1 has failed, no acks received
    2020-07-21T06:38:19.957Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-2.dc1 10.4.1.115:8302
    2020-07-21T06:38:49.958Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-1.dc1 10.4.4.17:8302
    2020-07-21T06:39:19.959Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-1.dc1 10.4.4.17:8302
    2020-07-21T06:39:49.960Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-0.dc1 10.4.3.14:8302
    2020-07-21T06:40:19.961Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to consul-server-1.dc1 10.4.4.17:8302

The consul server on k8’s

2020-07-21T06:59:52.644Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc
    2020-07-21T06:59:55.662Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc
    2020-07-21T07:00:00.033Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc
    2020-07-21T07:00:06.254Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc
    2020-07-21T07:00:07.022Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc
    2020-07-21T07:00:08.752Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc
    2020-07-21T07:00:08.942Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc
    2020-07-21T07:00:09.539Z [ERROR] agent.server.rpc: TLS handshake failed: conn=from=10.4.3.13:56790 error="remote error: tls: bad certificate"
    2020-07-21T07:00:13.251Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to ansible-orderer.vm-dc 10.128.0.42:8302
    2020-07-21T07:00:22.640Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc
    2020-07-21T07:00:28.017Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc
    2020-07-21T07:00:33.733Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc
    2020-07-21T07:00:36.444Z [WARN]  agent.server.rpc: RPC request to DC is currently failing as no server can be reached: datacenter=vm-dc

Started the VM with
consul agent -config-dir $PWD -bootstrap -advertise <internal-ip>

I think the local mesh gateway on the VMs may still not be configured properly.

It looks like the VM consul agent can talk to the kube cluster but it may not be talking through its local mesh gateway. It will go directly to the kube cluster’s mesh gateways (bypassing its local mesh gateway) until the local mesh gateway is working. This would explain why the kube cluster cannot talk to the VM datacenter, because it must go through the VM’s mesh gateway.

To diagnose if the VM’s mesh gateway is working, can you show the logs for that envoy proxy? Can you also curl the envoy proxy’s admin port and show the output of /clusters?

Also if you want to DM me on Twitter (https://twitter.com/lkysow) we can set up a Zoom call to live debug.

We figured it out! We were getting this error on the servers running in k8s:

 ERROR] agent.server.memberlist.wan: memberlist: Failed to send ping: x509: certificate is valid for server.vm-dc.consul, localhost, not ansible-orderer.server.vm-dc.consul

Which was due to the server on the VMs not having the DNS SAN it needed for its cert. We recreated the cert with that SAN and it worked:

consul tls cert create -server -dc=vm-dc -additional-dnsname=ansible-orderer.server.vm-dc.consul

Just want to add that ansible-orderer is the hostname that is getting appended, -additional-dnsname=<hostname>.server.vm-dc.consul