.consul DNS oddity

Hello all:

I am troubleshooting a DNS lookup against Consul services, specifically for Vault.
My environment:
Consul v1.20.0
Vault 1.18.0

I have a three-node cluster of Nomad, Consul, and Vault. I also forward .consul on my internal DNS servers to the respective Consul endpoints.

My Vault is currently unsealed.

I have noticed that if I perform a DNS query against vault.my-fqdn, I get an NXDOMAIN response.
If I query the consul server directly, I see that vault.service.consul’s results differ from nomad.service.consul or consul.service.consul:

dig @192.168.2.10 -p 8600 vault.service.consul

; <<>> DiG 9.10.6 <<>> @192.168.2.10 -p 8600 vault.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50250
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;vault.service.consul.		IN	A

;; ANSWER SECTION:
vault.service.consul.	0	IN	CNAME	hostname01.

;; Query time: 39 msec
;; SERVER: 192.168.2.10#8600(192.168.2.10)
;; WHEN: Thu Nov 14 11:55:56 EST 2024
;; MSG SIZE  rcvd: 83
dig @192.168.2.10 -p 8600 consul.service.consul

; <<>> DiG 9.10.6 <<>> @192.168.2.10 -p 8600 consul.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4796
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;consul.service.consul.		IN	A

;; ANSWER SECTION:
consul.service.consul.	0	IN	A	192.168.11.aa
consul.service.consul.	0	IN	A	192.168.11.ab
consul.service.consul.	0	IN	A	192.168.11.ac

;; Query time: 41 msec
;; SERVER: 192.168.2.10#8600(192.168.2.10)
;; WHEN: Thu Nov 14 11:57:40 EST 2024
;; MSG SIZE  rcvd: 98

Shouldn’t the vault.service.consul query return three results for each node in the cluster?

Is this normal for queries against vault.service.consul?

@originaltrini0 I see the DNS query for vault.service.consul is returning a CNAME of hostname01.. This will likely not be resolvable by any client because hostname01. is not a valid top-level domain.

Typically Vault should be registering the IP addresses of each instance into Consul instead of hostname.

Would you mind sharing how you’re configuring Vault to register itself into Consul?

@blake, thanks for responding. Please see my configurations for Vault and Consul from one node in the cluster:

Vault:

ui            = true
cluster_addr  = "https://prod-core-services01:8201"
api_addr      = "https://prod-core-services01:8200"
disable_mlock = true

storage "raft" {
  path    = "/opt/vault/data"
  
  retry_join {
    leader_tls_servername   = "prod-core-services02"
    leader_api_addr         = "https://prod-core-services02:8200"
    leader_ca_cert_file     = "/etc/step/certs/root_ca.crt"
    leader_client_cert_file = "/etc/step/certs/vault/vault.crt"
    leader_client_key_file  = "/etc/step/certs/vault/vault.key"
  }
  retry_join {
    leader_tls_servername   = "prod-core-services03"
    leader_api_addr         = "https://prod-core-services03:8200"
    leader_ca_cert_file     = "/etc/step/certs/root_ca.crt"
    leader_client_cert_file = "/etc/step/certs/vault/vault.crt"
    leader_client_key_file  = "/etc/step/certs/vault/vault.key"
  }
}

listener "tcp" {
  address            = ":8200"
  tls_cert_file      = "/etc/step/certs/vault/vault.crt"
  tls_key_file       = "/etc/step/certs/vault/vault.key"
  tls_client_ca_file = "/etc/step/certs/root_ca.crt"
}

service_registration "consul" {
  address      = "http://127.0.0.1:8500"
}

telemetry {
  disable_hostname = true
  prometheus_retention_time = "30s"
}

Consul:

datacenter = "homelab"
data_dir = "/opt/consul/data"
encrypt = "<REDACTED>"
retry_join = [
  "192.168.100.11",
  "192.168.100.12"
]
telemetry {
  disable_hostname = true
  prometheus_retention_time = "30s"
}
server = true
bind_addr = "192.168.100.10"
client_addr = "0.0.0.0"
node_name = "prod-core-services01"
ui_config {
  enabled = true
}
log_level  = "INFO"

When I query Consul for Vault:

dig @192.168.100.10 -p 8600 vault.service.consul

; <<>> DiG 9.10.6 <<>> @192.168.100.10 -p 8600 vault.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3258
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;vault.service.consul.		IN	A

;; ANSWER SECTION:
vault.service.consul.	0	IN	CNAME	prod-core-services02.

;; Query time: 38 msec
;; SERVER: 192.168.100.10#8600(192.168.100.10)
;; WHEN: Thu Dec 12 20:58:36 EST 2024
;; MSG SIZE  rcvd: 83

Using the consul cli:

$ consul catalog nodes -service=vault
Node                  ID        Address         DC
prod-core-services01  fdaa9e18  192.168.100.10  homelab
prod-core-services02  8de9943e  192.168.100.11  homelab
prod-core-services03  36374725  192.168.100.12  homelab

Please let me know your thoughts.

Thanks

@blake
I am following up as I was able to overcome my issue.
In Vault’s configuration, I changed all hostnames to IP addresses:

ui            = true
cluster_addr  = "https://192.168.100.10:8201"
api_addr      = "https://192.168.100.10:8200"
disable_mlock = true

storage "raft" {
  path    = "/opt/vault/data"
  
  retry_join {
    leader_tls_servername   = "192.168.100.11"
    leader_api_addr         = "https://192.168.100.11:8200"
    leader_ca_cert_file     = "/etc/step/certs/root_ca.crt"
    leader_client_cert_file = "/etc/step/certs/vault/vault.crt"
    leader_client_key_file  = "/etc/step/certs/vault/vault.key"
  }
  retry_join {
    leader_tls_servername   = "192.168.100.12"
    leader_api_addr         = "https://192.168.100.12:8200"
    leader_ca_cert_file     = "/etc/step/certs/root_ca.crt"
    leader_client_cert_file = "/etc/step/certs/vault/vault.crt"
    leader_client_key_file  = "/etc/step/certs/vault/vault.key"
  }
}

listener "tcp" {
  address            = ":8200"
  tls_cert_file      = "/etc/step/certs/vault/vault.crt"
  tls_key_file       = "/etc/step/certs/vault/vault.key"
  tls_client_ca_file = "/etc/step/certs/root_ca.crt"
}

service_registration "consul" {
  address      = "http://127.0.0.1:8500"
}

telemetry {
  disable_hostname = true
  prometheus_retention_time = "30s"
}

And now DNS queries against vault.service.consul and vault.my-fqdn are working correctly now:

dig @192.168.100.10 -p 8600 vault.service.consul +short
192.168.100.10
192.168.100.12
192.168.100.11

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.