How do I get the envoy sidecar to connect to Consul over TLS?

Hi :wave:

I am trying to deploy Vault + Consul + Nomad with full ACL + TLS support. I’m pretty close to having everything working except for some reason my envoy sidecars don’t seem to be able to connect to Consul over TLS. I get the following errors in envoy logs:

[2022-12-01 01:10:40.112][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 30s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination

After a lot of bisecting in my configurations to find the source of the issue I got it down to the consul stanza in my nomad config. It works when I specify the plaintext port but not tls one.
This works:

consul {
  address      = "127.0.0.1:8500"
}

This doesn’t work:

consul {
  address      = "127.0.0.1:8501"
  # grpc_address = "127.0.0.1:8503" # I've tried with and without this line

  ssl       = true
  ca_file   = "cert_dir/ca.pem"
  cert_file = "cert_dir/chain.pem"
  key_file  = "cert_dir/key.pem"
}

(FYI: I’ve disabled all ACLs in the debugging process and have yet to re-enable them).

Looking up the error online I found this issue which seems to indicate that the envoy bootstrap config should include a tls_context section to connect over TLS, which I don’t get:

"static_resources": {
    "clusters": [
      {
        "name": "local_agent",
        "ignore_health_on_host_removal": false,
        "connect_timeout": "1s",
        "type": "STATIC",
        "http2_protocol_options": {},
        "loadAssignment": {
          "clusterName": "local_agent",
          "endpoints": [
            {
              "lbEndpoints": [
                {
                  "endpoint": {
                    "address": {
                      "pipe": {
                        "path": "alloc/tmp/consul_grpc.sock"
                      }
                    }
                  }
                }
              ]
            }
          ]
        }
      }
    ]
  },

Looking up the relevant code in Consul, it looks like this section is only generated when the grpc address starts with “https://” , which nomad never does.

So, unless I missed something, it looks like Nomad doesn’t support the connect sidecar connecting over TLS to Consul. Is that intentional / a known limitation or a missing feature? Some feedback would be very valuable! :slight_smile: I can share my complete consul/nomad configs too it that helps.
Hope I haven’t missed something in the docs!

PS: this seems like a similar issue to Connect with Envoy: gRPC over HTTPS in a new cluster only works when CONSUL_HTTP_SSL is true · Issue #7473 · hashicorp/consul · GitHub but it looks like the implementation has changed since then.

For anyone still wondering how to do this, it’s only mentioned as a warning in the Nomad docs’ Consul block config:

You have to disable grpc verify_incoming in Consul’s config bc it doesn’t support incoming TLS verification!

Working consul.hcl snippet example:

ports {
  https = 8501
  #http = 8500
  http = -1 # disable http
  grpc_tls = 8502 # must be its own port
  grpc = 8503
}

connect {
  enabled = true
}

addresses = {
  https = "0.0.0.0"
#  http  = "127.0.0.1"
#  grpc  = "127.0.0.1"
}

tls {
  defaults {
    ca_file = "/etc/ssl/certs/consul-agent-ca.pem"
    cert_file = "/etc/ssl/certs/dc1-server-consul-0.pem"
    key_file = "/etc/ssl/certs/dc1-server-consul-0-key.pem"
    verify_incoming = true
    verify_outgoing = true
    verify_server_hostname = true
  }
  grpc {
    # this needs to be disabled!
    verify_incoming = false
  }
}

# disable, so Consul doesn't use connect-ca!
#auto_encrypt {
#  allow_tls = true
#}

For nomad.hcl, you need:

consul {
  #address = "192.168.178.35:8500"
  address = "192.168.178.35:8501" # https is on port 8501
  token = "<your-token>"
  ssl = true # forces https
  verify_ssl = true
  grpc_ca_file = "<your-ca-cert>"
  ca_file = "<your-ca-cert>"
  cert_file = "<your-client-cert>"
  key_file = "<your-client-cert>"

  auto_advertise = true
  server_auto_join = true
  client_auto_join = true
  server_service_name = "nomad"
  client_service_name = "nomad-client"
  #grpc_address = "192.168.178.35:8503"
  grpc_address = "192.168.178.35:8502" # tls

  # required, so Consul doesn't use anon token for KV:
  service_identity {
    aud = ["consul.io"]
    ttl = "1h"
  }

  task_identity {
    aud = ["consul.io"]
    ttl = "1h"
  }
  # be sure to also follow the docs to create policies for nomad agents & tasks, as well as a role for nomad-tasks.
}

Also, you will need to add proxy defaults to Consul:

Kind      = "proxy-defaults"
Name      = "global"
Config {
  local_connect_timeout_ms = 1000
  handshake_timeout_ms     = 10000
}
consul config write proxy-defaults.hcl

You may also need to give Consul’s anon token more rights by attaching it to a policy that has this:

service_prefix "" { policy = "read" }
node_prefix    "" { policy = "read" }

Using these settings, I’m able to run the countdash example job with ACL & mTLS enabled:



This was not easy to figure out and the docs should really be updated with an example with all the required steps.