Hi, I have some questions and confusion around configuring Connect jobs in Nomad with Consul TLS enabled I’m hoping someone can help with.
First, to keep this post short but provide enough context, here is a gist with Consul agent config, Nomad client config, and a few example Nomad dashboard example connect jobs using bridge mode networking and which provide ingress traefik in different manners to the connect job: gist link.
As mentioned in the comments in the gist, the Hashicorp app versioning for these tests are:
Nomad 1.1.3
Consul 1.10.1 (patched for consul issue 10714)
In the gist, the dashboard-works.hcl
file launches a consul connect job in Nomad successfully and utilizes a traefik instance running on another host to provide ingress traefik to the dashboard Connect job.
Questions and points of confusion around this dashboard-works.hcl
job:
-
Connect spawns an envoy container for non-native connect jobs, and envoy by default tries to contact the Consul API at a 127.0.0.1 listener within the envoy container. Most documentation I’ve read around Nomad and Connect jobs doesn’t override the default address for the consul listener, but I’ve been unable to bind a consul api listener inside the container regardless of what combination of consul addresses.http, bind_addr or client_addr config I use. Shouldn’t I be able to do this so I don’t need to override the
CONSUL_HTTP_ADDR
andCONSUL_GRPC_ADDR
params inside the container to point to the host IP instead, like I’m currently doing in the gist files? -
The only way this job currently works is by injecting a Consul root token into the Nomad agent config, which is obviously not desirable. I have read in some issues that injecting an env var of
CONSUL_HTTP_TOKEN
with sufficient privileges will also work (ie: the same env stanza as theCONSUL_HTTP_ADDR
overrides mentioned above), but setting the Consul master token there does not allow the job to work. When the job does not work because a Consul master token is not provided, various gRPC xDS errors are logged to both the consul agent and envoy container stderr, respectively, seen in this gist comment. So I’m trying to figure out what permissions are required to make this job work without having to inject a Consul master token. I tried setting the consul default token to have as permissive policies as possible to determine what permission is needed, but it still throws errors even when allowing write to “” of agent, event, key, node, query, service, session prefixes. Any help figuring out what acl config is needed or being overlooked would be awesome!
Finally, after getting the basic dashboard Connect job to work with the notes mentioned above, I wanted to use a traefik ingress from a standalone host running traefik as a systemd service to provide load to the connect dashboard job. I was able to get this working as shown in the dashboard-works.hcl
job by creating an ingress Consul service tagged for traefik that specifically sets the http router to use the dashboard connect consul service. This works, but I don’t think is actually the way the traefik/Nomad-connect integration is intended to work.
I think the intended way is to do as shown in dashboard-direct.hcl
where the count-dashboard service simply sets it’s own traefik Host
tag. However, this results in 404
. Similarly, trying an embedded traefik ingress doesn’t fully work either. More details in the first comment of the gist. I’m confused on why these don’t seem to be working properly.
Comments on any of these issues or points of confusion are welcome, thanks!
John