Hi
I am trying to deploy Vault + Consul + Nomad with full ACL + TLS support. I’m pretty close to having everything working except for some reason my envoy sidecars don’t seem to be able to connect to Consul over TLS. I get the following errors in envoy logs:
[2022-12-01 01:10:40.112][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 30s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
After a lot of bisecting in my configurations to find the source of the issue I got it down to the consul
stanza in my nomad config. It works when I specify the plaintext port but not tls one.
This works:
consul {
address = "127.0.0.1:8500"
}
This doesn’t work:
consul {
address = "127.0.0.1:8501"
# grpc_address = "127.0.0.1:8503" # I've tried with and without this line
ssl = true
ca_file = "cert_dir/ca.pem"
cert_file = "cert_dir/chain.pem"
key_file = "cert_dir/key.pem"
}
(FYI: I’ve disabled all ACLs in the debugging process and have yet to re-enable them).
Looking up the error online I found this issue which seems to indicate that the envoy bootstrap config should include a tls_context
section to connect over TLS, which I don’t get:
"static_resources": {
"clusters": [
{
"name": "local_agent",
"ignore_health_on_host_removal": false,
"connect_timeout": "1s",
"type": "STATIC",
"http2_protocol_options": {},
"loadAssignment": {
"clusterName": "local_agent",
"endpoints": [
{
"lbEndpoints": [
{
"endpoint": {
"address": {
"pipe": {
"path": "alloc/tmp/consul_grpc.sock"
}
}
}
}
]
}
]
}
}
]
},
Looking up the relevant code in Consul, it looks like this section is only generated when the grpc address starts with “https://” , which nomad never does.
So, unless I missed something, it looks like Nomad doesn’t support the connect sidecar connecting over TLS to Consul. Is that intentional / a known limitation or a missing feature? Some feedback would be very valuable! I can share my complete consul/nomad configs too it that helps.
Hope I haven’t missed something in the docs!
PS: this seems like a similar issue to Connect with Envoy: gRPC over HTTPS in a new cluster only works when CONSUL_HTTP_SSL is true · Issue #7473 · hashicorp/consul · GitHub but it looks like the implementation has changed since then.