So, I got a Nomad/Vault/Consul cluster up and going using this excellent resource: nomad/terraform/aws at main · hashicorp/nomad · GitHub
Everything was working, but I wanted to upgrade to Ubuntu 20.04, since 16.04 was now EOL. I’m having trouble getting DNS lookup to work in the Docker containers. In other words, lookup of, say, “postgres.service.consul” isn’t working.
- I got rid of systemd-resolved on these systems, and consul is listening on on port 53 (on all machines, both nomad servers and clients)
- Consul is started with: /usr/local/bin/consul agent -config-dir=/etc/consul.d -dns-port=53 -recursor=172.31.0.2
On the Nomad client, a ‘dig postres.service.consul’ will return the IP of the container. But inside the container, the lookup fails. However, inside the container I can look up non-cluster address, e.g. google.com
/etc/resolv.conf on the host:
/etc/resolv.conf in the container:
But if I change it to 172.31.0.1 inside the container, it still will not lookup, and in fact times out.
- I’m sure I’m missing something simple, but I don’t know what.
- Any other troubleshooting information I can dig up?
Ah, just realized the docker container is getting set to the recursor setting. Why isn’t it getting 172.17.0.1 from docker. I’ll have to investigate more.
So, using 127.0.0.1 in the resolv.conf doesn’t work, even though it seems like it should, as that would use the host’s DNS.
Putting 172.17.0.1 in resolv.conf does work, but I don’t want to hard-code that in the job spec, because that could change host to host.
I’m at a loss here as to what broke in the docker config.
So, I just reverted to my Ubuntu 16.04 version, and it seems that it put 172.17.0.1 in the /etc/resolv.conf (at the top) in the container.
The search continues.
OK, so I’ve traced down the problem. Not sure of the solution yet.
When I bring everything up with Ubuntu 16.04 clients and servers, this line gets injected at the top of the resolv.conf in the Docker containers:
And everything works great!
However, when I bring everything up with Ubuntu 20.04, that line doesn’t get injected in the resolv.conf, and of course nothing works.
Any ideas on troubleshooting this?
Oh, I did realize that if I specify a DNS server in the job spec, that /etc/resolv.conf is rewritten.
Hmm…this may have more to do with systemd-resolved. Going to check that path…
Welp!! Solved it. While I’m not sure why this wasn’t needed under Ubuntu 16.04, the solution was to restart dockerd after modifying the resolv.conf file so it would pick up the changes and pass them in to newly created containers.
All is working!
Running into this issue as well, did you find a solution? I am not able to reach any services using the *.service.consul addresses inside containers that are spun up by Nomad. This works in ubuntu 16.04 but not on ubuntu 20.04. Any help greatly appreciated.