Consul will not resolve DNS names on ubuntu 18.04

I’m trying to setup a consul + nomad cluster on ubuntu 18.04. Attempting to use the systemd-resolvd setup method as documented here

/etc/systemd/resolved.conf

[Resolve]
DNS=127.0.0.1
Domains=~consul
service systemd-resolved restart

iptables are configured

iptables -t nat -A OUTPUT -d localhost -p udp -m udp --dport 53 -j REDIRECT --to-ports 8600
iptables -t nat -A OUTPUT -d localhost -p tcp -m tcp --dport 53 -j REDIRECT --to-ports 8600
iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
REDIRECT   tcp  --  anywhere             localhost.localdomain  tcp dpt:domain redir ports 8600
REDIRECT   udp  --  anywhere             localhost.localdomain  udp dpt:domain redir ports 8600

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination

I’ve ensured that consul port 8600 is listening on 127.0.0.1 and not the public ip

netstat -plnt | grep consul
tcp        0      0 10.47.80.7:8301         0.0.0.0:*               LISTEN      21245/consul
tcp        0      0 127.0.0.1:8500          0.0.0.0:*               LISTEN      21245/consul
tcp        0      0 127.0.0.1:8600          0.0.0.0:*               LISTEN      21245/consul

Yet I’m still not able to resolve .consul domains automatically

host foobar.service.consul
Host foobar.service.consul not found: 3(NXDOMAIN)
ping foobar.service.consul
ping: foobar.service.consul: Name or service not known
dig foobar.service.consul
# no results

Whereas these requests do work

dig @127.0.0.1 foobar.service.consul
# works
dig @127.0.0.1 -p 8600 foobar.service.consul
# works

Consul is configured as minimally as possible

{
  "acl": {
    "default_policy": "deny",
    "enable_token_persistence": true,
    "enabled": true
  },
  "bind_addr": "10.x.x.x",
  "data_dir": "/opt/consul",
  "datacenter": "foobar",
  "log_level": "INFO",
  "primary_datacenter": "",
  "retry_join": [
    "10.x.x.x",
    "10.x.x.x",
    "10.x.x.x"
  ],
  "server": false,
  "ui": false
}

What am I missing? How can I get consul to work on ubuntu 18.04?

We’ve had nothing but trouble with systemd-resolved and ripped that sucker out

How do you get consul working on 18.04? I’ve tried ‘bind’, ‘unbound’ and ‘dnsmasq’ in addition to ‘systemd-resolvd’ with no luck.

@spuder the answer to your question is here:

Long story short, bind your client to 0.0.0.0

1 Like

I’m pretty new to all this, but shared similar issues, that I partly resolved.

The only reason why I use ubuntu 18 is because of the AWS vault deployment example, and I also use open vpn ami’s which are ubuntu 18. I’m open to alternatives though…

I tried to use https://github.com/hashicorp/terraform-aws-consul.git

In ubuntu 18, I found that /etc/resolv.conf was not linked to /run/systemd/resolve/resolv.conf, and although I could be wrong, this seems incorrect

I assume this is why it attempts to use 127.0.0.53 to resolve dns, which is definitely a part of the problem

So if I do this,

"sudo unlink /etc/resolv.conf",
"sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf"

Then this now works where previously it didn’t:

dig vault.service.consul

but… as soon as I do that,
i’ll see another error when I use sudo:

sudo: unable to resolve host ip-172-31-19-129: Resource temporarily unavailable

And that is because the hostname (ip-172-31-19-129) doesn’t exist in /etc/hosts.
So if ubuntu doesn’t set the hostname in /etc/hosts, how does it resolve it normally? Considering this information, I wonder if im on the right path and there is just some other missing piece…

I would love to find a golden answer to this problem. It seems google produces so many different solutions to the topic…

I think you’ll have to set the recursors to use your “normal” dns for every request consul isn’t responsible for:

recursors This flag provides addresses of upstream DNS servers that are used to recursively resolve queries if they are not inside the service domain for Consul. For example, a node can use Consul directly as a DNS server, and if the record is outside of the “consul.” domain, the query will be resolved upstream. As of Consul 1.0.1 recursors can be provided as IP addresses or as go-sockaddr templates. IP addresses are resolved in order, and duplicates are ignored.

# upstream DNS servers
recursors = ["<your-regular-dns-here>"]

Maybe this was a different issue on my configuration, but it’s the first that comes to my mind. :slight_smile:

Thanks @Wolfsrudel !

I will try that out.

There may be another problem additionally though - normally shouldn’t the hostname exist in /etc/hosts? And since preserve_hostname is disabled, I don’t understand why ubuntu is not adding the hostname in there on startup (or if it needs to). Is there anyway I can find out what startup script actually operates on preserve_hostname and how it is supposed to update /etc/hosts if the hostname changes all the time? or am I heading down the wrong path with this thinking?

For posterity - this was also educational - linux - Why does /etc/resolv.conf point at 127.0.0.53? - Unix & Linux Stack Exchange

2 Likes

I have solved this. I was making false assumptions and there were a bunch of things I was doing wrong that sent me down the wrong track.

Firstly, the dhcp options, definitely helped.

resource "aws_vpc_dhcp_options" "main" {
  count       = var.create_vpc ? 1 : 0
  domain_name          = "service.consul"
  domain_name_servers  = ["127.0.0.1", "AmazonProvidedDNS"]
  ntp_servers          = ["127.0.0.1"]
  netbios_name_servers = ["127.0.0.1"]
  netbios_node_type    = 2
  tags = merge(var.common_tags, local.extra_tags, map("Name", format("dhcpoptions_%s", local.name)))
}

I also learnt that I made a big mistake by sym linking this:

sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf

…before I had problems because I didn’t have the correct dhcp options, and so doing the above step got me around that, and got consul working, but it broke everything else.
It is best to leave the default stub resolver.

I also found that the default settings that the systemd-resolved helper applies were incorrect.

Domains=~consul

…was not consistently working, especially after booting the ami on another instance, and also for an open vpn ubuntu 18 AMI I had. I read somewhere that an FQDN in this config needed to end with a ‘.’, I can’t be 100% sure if it was that or just the more specific name, but in changing just this line, dig commands for that domain suddenly switched from 127.0.0.53 to the correct 127.0.0.1 and worked …

Domains=~service.consul.

so I run this now before I install the systemd-resolved script…

sudo sed -i "s/#Domains=/Domains=~service.consul./g" /etc/systemd/resolved.conf

Doing these steps, the recursors setting was also not required.

This is an example snippet from my current packer template that is working for ubuntu 18.

  provisioner "shell" { # Generate certificates with vault.
    inline = [
      "set -x; sudo sed -i \"s/#Domains=/Domains=~service.consul./g\" /etc/systemd/resolved.conf",
      "set -x; /tmp/terraform-aws-consul/modules/setup-systemd-resolved/setup-systemd-resolved",
      "set -x; sudo systemctl daemon-reload",
      "set -x; sudo systemctl restart systemd-resolved",
      "set -x; sudo cat /etc/systemd/resolved.conf",
      "set -x; sudo /opt/consul/bin/run-consul --client --cluster-tag-key \"${var.consul_cluster_tag_key}\" --cluster-tag-value \"${var.consul_cluster_tag_value}\"", # this is normally done with user data but dont for convenience here
      "set -x; consul members list",
      "set -x; dig $(hostname) | awk '/^;; ANSWER SECTION:$/ { getline ; print $5 ; exit }'", # check localhost resolve's
      "set -x; dig @127.0.0.1 vault.service.consul | awk '/^;; ANSWER SECTION:$/ { getline ; print $5 ; exit }'", # check consul will resolve vault
      "set -x; dig @localhost vault.service.consul | awk '/^;; ANSWER SECTION:$/ { getline ; print $5 ; exit }'", # check localhost will resolve vault
      "set -x; dig vault.service.consul | awk '/^;; ANSWER SECTION:$/ { getline ; print $5 ; exit }'", # check default lookup will resolve vault
      ]
  }

If I can see at the end of this segment that the 4 dig commands resolve IP addresses then it was succesful. to be doubly sure, this check (with dig) should also be done on the next boot.