Currently, Consul provides DNS resolution for services registered by Nomad (servicename.service.home.arpa). These services are picked up by Traefik, which serves as reverse proxy/load balancer and Traefik is configured to use Consul connect mesh natively.
This works fine as long as these services reside on the same machine as Traefik. Because Nomad will register the service in Consul with the same IP. So clients resolve to the Traefik host and Traefik will then connect to the service.
It becomes an issue when the services resides on another host. Because Nomad will register the other hosts IP address in Consul, so client DNS requests will resolve to the machine without a Traefik instance.
I could of course run Traefik as a system service, spawning an instance on each node, but I’d rather avoid introducing such a constraint and it would also weaken distribution transparency.
So the things I’m stuck on right now:
- Is the current use of Consul as both service discovery for the cluster and local DNS resolver a proper set-up or would it be considered a bad-practice?
- (If it is ok); Are there any mechanisms or architectural solutions to have Consul DNS queries resolve to the Traefik instance, instead of the actual host.
Not sure if I completely understand your issue.
Consul will report the port and IP of a registered service to Traefik, and Traefik will reverse proxy to services both on the same machine or on some remote machine.
Is the issue that Traefik is running on a specific machine? Multiple instances of Traefik are tricky, especially if you are using LE for cert generation.
I went with keepalived on my compute nodes, and Consul Connect Ingress Gateways on both machines which are forwarding the traffic to Traefik endpoints (services). This way, Traefik is just another service in my setup.
Regarding DNS: it’s is always messy
I went with two CoreDNS instances on my compute nodes, which are handed out via DHCP to all machine in my home lab. From there, I can easily define that .consul requests should go to Consul, *.lab.home requests return the keepalived floating IP (Ingress->Traefik) and all other DNS requests go via the router to the ISP.
There are probably a lot of other DNS packages out there, but I really like CoreDNS because it’s easy to deploy from a single Nomad job file.
Thanks for your help! I’m going to try and find out some more about the setup you’re describing.
To clarify:
Consul will report the port and IP of a registered service to Traefik, and Traefik will reverse proxy to services both on the same machine or on some remote machine.
This is what works as expected, and Traefik uses the consul service mesh for that as well.
The issue I’m bumping into is that DNS lookups for *.lab.home are also forwarded to Consul.
So basically; Traefik runs on node 1, service X runs on node 2. Consul instructs Traefik that service X runs on node 2. (So far so fine)
But because Consul is also the DNS resolver for the network, the client requests serviceX.lab.home and Consul returns the IP address of node 2, while the client should be connecting to Traefik on node 1 instead.
I was hoping that I could configure Consul in some way that DNS requests resolve to an IP address of a node that is running Traefik, instead of the node where the workload is actually running at.
I’ll dig some deeper into CoreDNS and see if that would fit my use-case.
Thanks for taking the time to help, appreciated!
I don’t think Consul should be used as a full-fat DNS solution. It’s great to lookup services easily, but not a replacement for a dedicated DNS server.
In my setup, I’m assuming at least on instance of CoreDNS, with a local Consul instance running on the same machine. That way, it’s quite easy to forward DNS requests to .consul to localhost:8600.
To make things clearer, please have a look at my setup at
The corefile defines the different tlds, and the zone file defines the wildcard lookup for *.lab.home. Should be quite straightforward.
In my homelab, I’m using two nodes with the class “compute” for CoreDNS, to add some resilience to my lab. The IPs of the compute nodes are handed out by DHCP for global DNS resolution.
Thanks! I should have been more specific, consul is only handling the *.lab.home zone, as those are services that are registered by Nomad.
My DNS setup looks more or less as follows:
/--> Domain Controller (*.domain.lab.home)
Unbound (IP distributed through DHCP) > PiHole ----> Consul (*.lab.home)
\--> External (others)
Ahhh, gotcha! I get what you are doing now.
I could do without adding CoreDNS to the mix and just configure the wildcard for a start. I like the idea of adding Consul ingress in front of Traefik. That seems like a good way to deal with state and allows easier configuration of non-HTTP routing as well. I can entirely remove the zone forward from the PiHole to Consul DNS in that case. I like how keepalived is applied in your setup as well, I don’t think I’ll need it for now but it’s good to have options.
Thanks again for your help!
Happy to help
I think I should document the setup somewhere. For me the stuff in the Core repo is a great way to setup a cluster, with resiliency a flexibility in mind.
I would expect that DNS records for the front door of your load balancer, traefik in this case, would have more memorable names than Consul provides. e.g. my-site.com, and that would route to the backend services either using a subdomain for each app or using a layer 7 construct such as path based routing e.g. my-site.com/my-app.
This would, as mentioned, require another DNS service to provide the my-site.com record.
That’s somewhat close, I try to avoid adding complexity unless it serves a purpose. So far, using my-app.lab.home was fine for me, and it brought the benefit of not having to register DNS records whenever I launched another service. But coupling to Consul service names like that brings some considerable downsides as pointed out in this thread, which justifies a different architecture. In hindsight I think it’s better to decouple the DNS name from the service name for better encapsulation as well.
I’m using Consul connect mesh so that I don’t need to expose backend services and so that I can leverage ACL’s instead of having to configure ports/firewalls.
As you’re using connect the ingress will create records automatically for upstream services in the format
[Service].ingress.consul