Strange cache on DNS forward zone

I have a corporate DNS server using Bind.
I have 3 consul servers. Consul is listening on localhost port 8660 for DNS.
On these servers I have also DNSmasq to forward DNS request from public IP port 53 to Consul on localhost.

The configuration of Consul is very simple. I just activated the DNS port. When I check the DNS locally on port 8660 it is working fine.

The configuration of DNSmasq on each consul server is:

server=/subdomain.domain.com/127.0.0.1#8600
server=XX.XX.XX.XX (first ip of the corporate DNS)
server=XX.XX.XX.XX (second ip of the corporate DNS)
listen-address=XX.XX.XX.XX (ip of the server)

The configuration on the corporate DNS is:

zone "subdomain.domain.com" {
    type forward;
    forward only;
    forwarders {
        XX.XX.XX.XX; (ip of the first consul)
        XX.XX.XX.XX; (ip of the second consul)
        XX.XX.XX.XX; (ip of the third consul)
    };
};

When some entries are added or removed from consul, the DNS zone on the corporate DNS is updated quite immediately.

When all the entries for a service are deleted from consul, the corporate DNS removed all entries in his table (which is normal).
But if the entries are created again, the DNS from consul is updated but not the corporate DNS.
To update the corporate DNS, I should execute the command rndc flushtree subdomain.domin.com

Hi @smutel,

It sounds like there is an NXDOMAIN response being cached somewhere which results in the corporate DNS not picking up on new services when they come online.

Consul should be returning an NXDOMAIN response when it receives a query for a service which no longer exists. The authority section of that response will also contain an SOA record which has a TTL field that specifies how long to cache the negative response (defined by dns_config.soa.min_ttl). By default this value is zero.

Have you changed this in Consul to a higher value (which would explain the caching)? If not, can you verify that the BIND servers are seeing a TTL of 0 in the response from dnsmasq? If they’re seeing a different value, then you may want to dig into where that is coming from.

Hello,

When I do a dig request on my consul locally, I have a TTL of 0:

consul-app-dc1-02.xxx.xxx. 0 IN A XX.XX.XX.XX

When I do a dig request on my dnsmasq locally, I have a TTL of 0:

consul-app-dc1-02.xxx.xxx. 0 IN A XX.XX.XX.XX

When I do a dig request on my corporate DNS, I have a TTL of 0:

consul-app-dc1-02.xxx.xxx. 0 IN A XX.XX.XX.XX

But I have also this part:

;; AUTHORITY SECTION:
xxx.xxx.               73434   IN      NS      yy.yyyy.yy.
xxx.xxx.               73434   IN      NS      yy.yyyy.yy.

;; ADDITIONAL SECTION:
yy.yyyy.yy.           73434   IN      A       XX.XX.XX.XX
yy.yyyy.yy.           73434   IN      A       XX.XX.XX.XX

@smutel,

Could you also share the response you receive when performing a query for a non-existent record? You should see something similar to this in each response.

$ dig @127.0.0.1 -p 8600 missing.service.consul

; <<>> DiG 9.12.4-P2 <<>> @127.0.0.1 -p 8600 missing.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 36182
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;missing.service.consul.		IN	A

;; AUTHORITY SECTION:
consul.			0	IN	SOA	ns.consul. hostmaster.consul. 1585758483 3600 600 86400 0
; <<>> DiG 9.11.5-P4-5.1-Debian <<>> @127.0.0.1 -p 8600 +noauthority +noquestion +noadditional +nostats toto.xxx.xxx.xxx.xxx
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 8495
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096

Hi @smutel,

I was hoping to see the entire output of the dig command – with the additional & authority sections – when issuing queries to Consul, dnsmasq, and BIND.

Based on the behavior you described I suspect that the NXDOMAIN is being cached somewhere when a non-existent record is queried. When a service is added, this cached record then prevents it from being discoverable in dnsmasq and/or BIND until either the zone is flushed, or the negative response expires in the cache.

Hopefully this helps clarify my thinking & points you to some additional areas to investigate in your DNS hierarchy.

Thanks for your answer.
Here is the entire output of consul:

; <<>> DiG 9.11.5-P4-5.1-Debian <<>> @127.0.0.1 -p 8600 toto.xxx.xxx.xxx
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 34229
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;toto..xxx.xxx.xxx. IN A

;; AUTHORITY SECTION:
infra.xxx.xxx.xxx. 0     IN      SOA     ns..xxx.xxx.xxx. hostmaster..xxx.xxx.xxx. 1585825497 3600 600 86400 0

;; Query time: 1 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Thu Apr 02 13:04:57 CEST 2020
;; MSG SIZE  rcvd: 113

Here is the entire result on dnsmasq:

; <<>> DiG 9.11.5-P4-5.1-Debian <<>> @127.0.0.1 toto.xxx.xxx.xxx
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

Here is the answer from the corporate DNS:

; <<>> DiG 9.11.5-P4-5.1-Debian <<>> @10.20.3.20 toto.xxx.xxx.xxx
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 32556
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: daceca2fcc14e157a88fd8f45e85c7c7e9d3bc05af826116 (good)
;; QUESTION SECTION:
;toto.xxx.xxx.xxx. IN A

;; AUTHORITY SECTION:
xxx.xxx.xxx. 10800 IN      SOA     ns.xxx.xxx.xxx. hostmaster.xxx.xxx.xxx. 1585825735 3600 600 86400 0

;; Query time: 2 msec
;; SERVER: 10.20.3.20#53(10.20.3.20)
;; WHEN: Thu Apr 02 13:08:55 CEST 2020
;; MSG SIZE  rcvd: 141