Consul Connect + Envoy - gRPC issues

Hello Nomad team and community.

I am having troubles configuring Consul connect with Envoy proxy on AWS, and I would appreciate some guidance on how to proceed or troubleshoot it. In short, connect-proxy is throwing warnings in stderr like this:

[2021-04-23 02:03:39.373][1][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination

I am running a cluster on consul-1.9.5, nomad-1.0.4, and envoy-1.16.2.

Here is a test job that uses Consul connect (I dropped healthcheck for the time being, otherwise deployment stucks):

job "http-connect" {
 datacenters = ["us-east-1c"]

 group "echo" {
   network {
     mode = "bridge"
   }

   service {
     name = "http-connect"
     port = "8080"

     connect {
       sidecar_service {}
     }
   }

   task "server" {
     driver = "docker"

     config {
       image = "hashicorp/http-echo:latest"

       args = [
         "-listen", ":8080",
         "-text", "Hello and welcome to http-echo running on port 8080",
       ]
     }
   }
 }
}

This job can be successfully deployed to Nomad and registered in Consul via “nomad run”. I can also use exec connect-proxy alloc and get inside the container.

Security groups in AWS are configured to accept traffic from 8300, 8301, 8302, 8400, 8500, 8502, 8600, 21000-21255 port (pretty much what’s in Required Ports | Consul by HashiCorp except an extra 8400 port). All outbound ports are open.

Nomad agents open 4646, 4647, 4648 (both TCP and UDP) and 20000-32000 dynamic port range.

From a connect-proxy instance, I can query Consul server:

$ curl 10.0.1.145:8500
<a href="/ui/">Moved Permanently</a>.

Another request to Consul 8502 returns something and closes a connection:

$ curl 10.0.1.145:8502 | wc -c
 % Total % Received % Xferd Average Speed Time Time Time Current
                                Dload Upload Total Spent Left Speed
100 42 0 42 0 0 10388 0 --:--:-- --:--:-- --:--:-- 14000
curl: (56) Recv failure: Connection reset by peer
21

Consul intentions allow traffic from all services to all services.

Nonetheless, a curl is stuck on connecting to a sock file, and I suspect that it also results in the “gRPC config stream closed: 14” error that I mentioned above.

$ curl --unix-socket /alloc/tmp/consul_grpc.sock http:/v1/config -v
* Trying /alloc/tmp/consul_grpc.sock...
^C

At this point I am a bit lost, and I would appreciate any ideas what is missing or could be wrong in my setup.

I am likely missing some important detail in the post, but happy to drop config files or anything else if it helps.

1 Like

A bit more troubleshooting. I ran consul monitor -log-level debug on all instances. There are no warnings/errors on Consul servers instances, but Consul on a Nomad client shows this:

2021-04-24T18:05:04.351Z [WARN] agent: Check socket connection failed: check=service:_nomad-task-1581dbec-92a2-1956-e313-39d8f7c2bc3d-group-echo-http-connect-8080-sidecar-proxy:1 error="dial tcp 10.0.1.238:25100: connect: connection refused"
2021-04-24T18:05:04.351Z [WARN] agent: Check is now critical: check=service:_nomad-task-1581dbec-92a2-1956-e313-39d8f7c2bc3d-group-echo-http-connect-8080-sidecar-proxy:1

This is expected as a sidecar proxy is not listening on this port, I ran it on a Nomad client instance where the application is deployed:

$ netstat -oan | grep 25100 | wc -l
0

Nonetheless, Nomad server shows that it’s been assigned to 25100 port:

Also, there are containers running on the instance:

$ docker ps --no-trunc
CONTAINER ID                                                       IMAGE                                                                                              COMMAND                                                                                                  CREATED              STATUS              PORTS     NAMES
367a1b94b3b8d7b6e6666dddf0b1bad8cd8491b69c653644227024ab89dff35f   hashicorp/http-echo:latest                                                                         "/http-echo -listen :8080 -text 'Hello and welcome to http-echo running on port 8080'"                   About a minute ago   Up About a minute             server-c86d38e8-c0ac-d001-142a-46408695cebf
53e56869810ff925a1d6063b49ae253dd2ee43d2774bd6e13674acdcada48d5b   envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09   "/docker-entrypoint.sh -c /secrets/envoy_bootstrap.json -l info --concurrency 1 --disable-hot-restart"   About a minute ago   Up About a minute             connect-proxy-http-connect-c86d38e8-c0ac-d001-142a-46408695cebf
883e96e43504bf53af1d38a65445174af9fe90f076561388718bd6f7724035eb   gcr.io/google_containers/pause-amd64:3.1                                                           "/pause"                                                                                                 About a minute ago   Up About a minute             nomad_init_c86d38e8-c0ac-d001-142a-46408695cebf

However, a sidecar proxy is not exposing its port. It is curious to know which config file is responsible for that, could it be a silly mistake in my Nomad config?

hi @kirill , did you get to the bottom of this issue? My environment is nearly identical to what you’ve described and I’m facing the same issue you mentioned in OP.

Ah, good memories. I gave up and simplified my setup. AFAIR, since then Hashicorp upgraded hashistack starter-kit and a few other modules, but I haven’t tried those yet.

1 Like