Envoy -> consul "upstream connect error or disconnect/reset before headers. reset reason: connection termination"

Been fighting this error today after upgrading consul, nomad, and vault. I use consul connect with my nomad containers, but I’m getting an error in all of my sidecars. It looks like it’s connecting to consul’s grpc api, but then the connection is reset. I found some older posts where this was happening but I didn’t find any real solutions for it. I tweaking consul’s grpc api between tls/non-tls but it’s always the same error. At one point it did work for awhile, but then stopped working after some time.

Some logs from a sidecar

[2022-12-23 23:16:03.914][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2022-12-23 23:16:03.914][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:290] trying to create new connection
[2022-12-23 23:16:03.914][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2022-12-23 23:16:03.914][1][debug][http2] [source/common/http/http2/codec_impl.cc:1794] [C6] updating connection-level initial window size to 268435456
[2022-12-23 23:16:03.914][1][debug][connection] [./source/common/network/connection_impl.h:92] [C6] current connecting state: true
[2022-12-23 23:16:03.914][1][debug][client] [source/common/http/codec_client.cc:57] [C6] connecting
[2022-12-23 23:16:03.914][1][debug][connection] [source/common/network/connection_impl.cc:924] [C6] connecting to alloc/tmp/consul_grpc.sock
[2022-12-23 23:16:03.916][1][debug][connection] [source/common/network/connection_impl.cc:683] [C6] connected
[2022-12-23 23:16:03.916][1][debug][client] [source/common/http/codec_client.cc:88] [C6] connected
[2022-12-23 23:16:03.916][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:327] [C6] attaching to next stream
[2022-12-23 23:16:03.916][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:181] [C6] creating stream
[2022-12-23 23:16:03.916][1][debug][router] [source/common/router/upstream_request.cc:564] [C0][S7263023012805180473] pool ready
[2022-12-23 23:16:03.920][1][debug][connection] [source/common/network/connection_impl.cc:651] [C6] remote close
[2022-12-23 23:16:03.920][1][debug][connection] [source/common/network/connection_impl.cc:250] [C6] closing socket: 0
[2022-12-23 23:16:03.920][1][debug][client] [source/common/http/codec_client.cc:107] [C6] disconnect. resetting 1 pending requests
[2022-12-23 23:16:03.920][1][debug][client] [source/common/http/codec_client.cc:156] [C6] request reset
[2022-12-23 23:16:03.920][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:214] [C6] destroying stream: 0 remaining
[2022-12-23 23:16:03.920][1][debug][router] [source/common/router/router.cc:1212] [C0][S7263023012805180473] upstream reset: reset reason: connection termination, transport failure reason: 
[2022-12-23 23:16:03.920][1][debug][http] [source/common/http/async_client_impl.cc:105] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: connection termination'

[2022-12-23 23:16:03.920][1][debug][config] [./source/common/config/grpc_stream.h:207] DeltaAggregatedResources gRPC config stream to local_agent closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
[2022-12-23 23:16:03.920][1][debug][config] [source/common/config/grpc_subscription_impl.cc:115] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2022-12-23 23:16:03.920][1][debug][config] [source/common/config/grpc_subscription_impl.cc:115] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2022-12-23 23:16:03.921][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:483] [C6] client disconnected, failure reason: 
[2022-12-23 23:16:03.921][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:453] invoking idle callbacks - is_draining_for_deletion_=false
[2022-12-23 23:16:07.265][1][debug][main] [source/server/server.cc:251] flushing stats
[2022-12-23 23:16:07.265][1][debug][main] [source/server/server.cc:261] Envoy is not fully initialized, skipping histogram merge and flushing stats
[2022-12-23 23:16:12.265][1][debug][main] [source/server/server.cc:251] flushing stats
[2022-12-23 23:16:12.265][1][debug][main] [source/server/server.cc:261] Envoy is not fully initialized, skipping histogram merge and flushing stats
[2022-12-23 23:16:17.264][1][warning][config] [source/common/config/grpc_subscription_impl.cc:120] gRPC config: initial fetch timed out for type.googleapis.com/envoy.config.listener.v3.Listener
[2022-12-23 23:16:17.264][1][debug][init] [source/common/init/watcher_impl.cc:14] target LDS initialized, notifying init manager Server
[2022-12-23 23:16:17.264][1][debug][init] [source/common/init/watcher_impl.cc:14] init manager Server initialized, notifying RunHelper
[2022-12-23 23:16:17.264][1][info][config] [source/server/listener_manager_impl.cc:831] all dependencies initialized. starting workers
[2022-12-23 23:16:17.264][1][debug][config] [source/server/listener_manager_impl.cc:868] starting worker 0
[2022-12-23 23:16:17.265][15][debug][main] [source/server/worker_impl.cc:124] worker entering dispatch loop
[2022-12-23 23:16:17.265][15][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1101] adding TLS cluster local_agent
[2022-12-23 23:16:17.265][16][debug][grpc] [source/common/grpc/google_async_client_impl.cc:51] completionThread running
[2022-12-23 23:16:17.265][15][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1179] membership update for TLS cluster local_agent added 1 removed 0
[2022-12-23 23:16:17.266][1][debug][main] [source/server/server.cc:251] flushing stats
[2022-12-23 23:16:22.267][1][debug][main] [source/server/server.cc:251] flushing stats
[2022-12-23 23:16:27.267][1][debug][main] [source/server/server.cc:251] flushing stats
[2022-12-23 23:16:32.268][1][debug][main] [source/server/server.cc:251] flushing stats
[2022-12-23 23:16:33.101][1][debug][config] [./source/common/config/grpc_stream.h:62] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2022-12-23 23:16:33.101][1][debug][router] [source/common/router/router.cc:470] [C0][S8979155582164762442] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2022-12-23 23:16:33.102][1][debug][router] [source/common/router/router.cc:678] [C0][S8979155582164762442] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-consul-token', '43d55efe-5967-a195-a9cf-f2146ffb369a'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.54'

[2022-12-23 23:16:33.102][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2022-12-23 23:16:33.102][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:290] trying to create new connection
[2022-12-23 23:16:33.102][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2022-12-23 23:16:33.102][1][debug][http2] [source/common/http/http2/codec_impl.cc:1794] [C7] updating connection-level initial window size to 268435456
[2022-12-23 23:16:33.102][1][debug][connection] [./source/common/network/connection_impl.h:92] [C7] current connecting state: true
[2022-12-23 23:16:33.102][1][debug][client] [source/common/http/codec_client.cc:57] [C7] connecting
[2022-12-23 23:16:33.102][1][debug][connection] [source/common/network/connection_impl.cc:924] [C7] connecting to alloc/tmp/consul_grpc.sock
[2022-12-23 23:16:33.104][1][debug][connection] [source/common/network/connection_impl.cc:683] [C7] connected
[2022-12-23 23:16:33.104][1][debug][client] [source/common/http/codec_client.cc:88] [C7] connected
[2022-12-23 23:16:33.104][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:327] [C7] attaching to next stream
[2022-12-23 23:16:33.104][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:181] [C7] creating stream
[2022-12-23 23:16:33.105][1][debug][router] [source/common/router/upstream_request.cc:564] [C0][S8979155582164762442] pool ready
[2022-12-23 23:16:33.108][1][debug][connection] [source/common/network/connection_impl.cc:651] [C7] remote close
[2022-12-23 23:16:33.108][1][debug][connection] [source/common/network/connection_impl.cc:250] [C7] closing socket: 0
[2022-12-23 23:16:33.108][1][debug][client] [source/common/http/codec_client.cc:107] [C7] disconnect. resetting 1 pending requests
[2022-12-23 23:16:33.109][1][debug][client] [source/common/http/codec_client.cc:156] [C7] request reset
[2022-12-23 23:16:33.109][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:214] [C7] destroying stream: 0 remaining
[2022-12-23 23:16:33.109][1][debug][router] [source/common/router/router.cc:1212] [C0][S8979155582164762442] upstream reset: reset reason: connection termination, transport failure reason: 
[2022-12-23 23:16:33.109][1][debug][http] [source/common/http/async_client_impl.cc:105] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: connection termination'

[2022-12-23 23:16:33.109][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 45s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
[2022-12-23 23:16:33.109][1][debug][config] [source/common/config/grpc_subscription_impl.cc:115] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2022-12-23 23:16:33.109][1][debug][config] [source/common/config/grpc_subscription_impl.cc:115] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2022-12-23 23:16:33.109][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:483] [C7] client disconnected, failure reason: 
[2022-12-23 23:16:33.109][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:453] invoking idle callbacks - is_draining_for_deletion_=false
[2022-12-23 23:16:37.268][1][debug][main] [source/server/server.cc:251] flushing stats
[2022-12-23 23:16:42.269][1][debug][main] [source/server/server.cc:251] flushing stats
[2022-12-23 23:16:47.269][1][debug][main] [source/server/server.cc:251] flushing stats
[2022-12-23 23:16:52.269][1][debug][main] [source/server/server.cc:251] flushing stats
[2022-12-23 23:16:53.374][1][debug][config] [./source/common/config/grpc_stream.h:62] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

FINALLY figured it out, I had consul running grpc with tls, but apparently nomad was not using tls. Enabled the non-tls grpc listener in consul, and changed nomad to use that and everything started working again.

How did you enable the non-tls grpc listener and what version of Consul are you on? Also what change did you make on Nomad to allow it to use the non-tls listener?

I’m using consul 1.14.3. There’s a compatibility issue with Nomad right now, Consul grpc_tls usage cannot deploy nomad jobs anymore · Issue #15266 · hashicorp/nomad · GitHub has some more info. There’s a comment that says Nomad 1.4.3 was patched but I don’t believe it was. I can’t find it now, but a few days ago I found an unreleased docs commit that says it will be fixed in 1.4.4.

EDIT: I allowed Nomad to use the non-tls grpc port by changing this

consul {
  grpc_address = "127.0.0.1:8502"
}

Consul config:

ports {
  http     = -1
  https    = 8501
  grpc     = 8502
  grpc_tls = 8503
}
1 Like

Found it docs: update Nomad 1.14 upgrade note to detail additional info. by jrasell · Pull Request #15538 · hashicorp/consul · GitHub

It’s actually a PR, but it says not to use Consul 1.14 with Nomad 1.4.3 and before.

Bonjour :wave:

I think I’m in the same situation.

Nomad 1.5.0
Consul 1.15.1

I changed nomad and consul configuration like @bradydean advice and it works.

Dumb question:
But is it the best practice? Or there is another to do?

Before I deploy to others DC? :sweat_smile: