(Note: All of this is running on Ubuntu Server 18.04 LTS - virgin VMs)
Long story short, on the KrakenD server, if I do a ‘dig SRV’ for a registered service the way KrakenD would, I get this:
$ dig identity-server.service.consul SRV +short
1 1 80 0a1e3730.addr.stage-vm.consul.
Can anyone explain the results to me? I don’t care about the 1s, and 80 is the port, but what’s “0a1e3730.addr”? (I know “stage-vm” is the data center and “consul” is obvious).
As I’ve been looking at other people’s examples online, I would’ve expected the result to look something like “identity-server.node.stage-vm.consul.” Or maybe an actual IP address?
Here’s the trick: If I use “0a1e3730.addr.stage-vm.consul.” in curl, it works fine…
KrakenD is configured to use “identity-server.service.consul”. Looking at tcpdump, it does appear to go to Consul, which in turn returns “0a1e3730.addr.stage-vm.consul.”, but that’s the end - from what I can tell, it asks Consul, gets an answer, and tries to use it – and returns “no hosts available” (so KrakenD thinks that thing is a hostname?)
I know there’s a lot of information here, but to summarize: I’m worried that Consul is returning something “weird” (for lack of a better word), and it’s breaking KrakenD. Since I don’t understand what it’s returning (or why), I don’t know what to look at to fix this (and for all I know this is just a red herring).
One more thing related to all this which I find interesting: If I run the dig command above without the +short option, it still just returns that one record I showed above.
But if I run it by specifying Consul as the NS, I get more information:
$ dig @10.30.54.161 identity-server.service.consul. SRV
;; ADDITIONAL SECTION: 0a1e3730.addr.stage-vm.consul. 0 IN A 10.30.55.48 consul1-vm.node.stage-vm.consul. 0 IN TXT "consul-network-segment="
This whole Additional Section is missing from the first dig query… and to my eyes (as a Linux noob), it looks important. But I have no idea why the queries are different - in theory both queries are going to the same Consul instance.
For background, the DNS resolution is being done via systemd. The resolved.conf file is pointing to Consul, and I have some iptables records to map the port from 53 to 8600. I know this all works (at least on some level) because curl works fine with “identity-server.service.consul”
I’ve been digging away at this for days. I appreciate any help, even if it’s just fragments of an answer, it may be enough to at least get me pointed in the right direction.