DNS tag.foo.service.consul fails after 1.2.4 -> 1.6.10 upgrade?

We are trying to upgrade our 1.2.4 cluster to 1.6.10 and have started with the agents. After upgrade:

$ host master.db63.service.consul
Host master.db63.service.consul not found: 3(NXDOMAIN)
(host db63.service.consul returns the correct addresses)

On un-upgraded agents/nodes and after rolling back to 1.2.4:

master.db63.service.consul has address 148.xxx.xxx.xxx

This is on Ubuntu 22.04.2 LTS with

cat /etc/systemd/resolved.conf.d/consul.conf
=[Resolve]
DNS=127.0.0.52
Domains=~consul

And iptables rules:

iptables -t nat -A OUTPUT -d 127.0.0.52 -p udp -m udp --dport 53 -j REDIRECT --to-ports 8600
iptables -t nat -A OUTPUT -d 127.0.0.52 -p tcp -m tcp --dport 53 -j REDIRECT --to-ports 8600

Any ideas? Thanks.

This is how things look in our test environment after upgrading from 1.0.6 → 1.2.4 and then upgrading the client node to 1.6.10:

root@consul4 ~ consul members
Node     Address          Status  Type    Build   Protocol  DC    Segment
consul1  10.156.0.5:8301  alive   server  1.2.4   2         dc19  <all>
consul2  10.156.0.6:8301  alive   server  1.2.4   2         dc19  <all>
consul3  10.156.0.7:8301  alive   server  1.2.4   2         dc19  <all>
consul4  10.156.0.8:8301  alive   client  1.6.10  2         dc19  <default>

On consul4:

root@consul4 ~ host foo.nginx.service.consul
Host foo.nginx.service.consul not found: 3(NXDOMAIN)
root@consul4 ~ host nginx.service.consul
nginx.service.consul has address 10.156.0.5
nginx.service.consul has address 10.156.0.7
nginx.service.consul has address 10.156.0.6

On consul1/2/3:

root@consul1 ~ host foo.nginx.service.consul
foo.nginx.service.consul has address 10.156.0.6

Consul1/2/3 have nginx registered and consul2 adds the “foo” tag:

cat /etc/consul.d/nginx.json
{
  "service": {
    "name": "nginx",
    "port": 80,
    "tags": ["foo"],
    "token": "...",
    "checks": [
      {
        "id": "nginx-listening",
        "name": "HTTP on port 80",
        "http": "http://localhost:80",
        "interval": "30s",
        "timeout": "5s"
      }
    ]
  }
}

Any ideas?

Hi @david.tinker1,

The official recommendation is always to upgrade the Consul Servers first, followed by the Clients. From the output, the servers are on an older version than the clients. Can you test by upgrading the servers first to the newer version and see if everything works for you?

Ref: Upgrade Consul | Consul | HashiCorp Developer

1 Like

Thanks. I didn’t realise the clients had to be done last. That did the trick.