Consul errors when `advertise_addr_wan` is set

I’m struggling to understand the behavior of consul in k8s when the config parameter of advertise_addr_wan is set.

I need to communicate to the consul cluster in k8s from other clusters located elsewhere, so I assumed the cluster would need to advertise over WAN using a VIP. I further assumed that setting that config parameter would only apply to servers joining from other clusters over WAN, and not to the servers deployed by helm within the k8s cluster. The docs appear to confirm this:

The advertise WAN address is used to change the address that we advertise to server nodes joining through the WAN

However, as soon as I set that within the helm chart, the consul nodes appear to all attempt to cluster using the WAN address. This may fail, depending on the status of the VIP and various port forwarding- which obviously in undesirable, because a LAN cluster within k8s should stay clustered as long as networking between the k8s nodes is functioning.

As soon as I remove that config option, the nodes cluster as I’d expect and everything is happy:

2021-11-17T22:15:02.768Z [INFO] agent.leader: started routine: routine="CA root pruning"
2021-11-17T22:15:02.772Z [INFO] agent.server: member joined, marking health alive: member=consul-consul-server-2
2021-11-17T22:15:02.779Z [INFO] agent.server: member joined, marking health alive: member=consul-consul-server-0
2021-11-17T22:15:02.847Z [INFO] agent.server: member joined, marking health alive: member=consul-consul-server-1
2021-11-17T22:15:02.854Z [INFO] agent.server: federation state anti-entropy synced
2021-11-17T22:15:03.035Z [INFO] agent: Synced node info 

But add it back in, and the nodes appear to be joining over wan and yelling about weird pings…

2021-11-17T22:29:07.181Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: consul-consul-server-1.control 10.238.1.20
2021-11-17T22:29:07.181Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: consul-consul-server-0.control 10.238.1.20 
2021-11-17T22:29:14.117Z [INFO] agent: Synced node info
2021-11-17T22:29:15.071Z [WARN] agent.server.memberlist.wan: memberlist: Got ping for unexpected node 'consul-consul-server-0.control' from=10.42.7.0:46994
2021-11-17T22:29:16.969Z [WARN] agent.server.memberlist.wan: memberlist: Got ping for unexpected node 'consul-consul-server-0.control' from=10.42.0.0:59135
2021-11-17T22:29:16.969Z [WARN] agent.server.memberlist.wan: memberlist: Refuting a suspect message (from: consul-consul-server-1.control)
2021-11-17T22:29:18.071Z [WARN] agent.server.memberlist.wan: memberlist: Got ping for unexpected node 'consul-consul-server-0.control' from=10.42.7.0:46994
2021-11-17T22:29:18.072Z [WARN] agent.server.memberlist.wan: memberlist: Got ping for unexpected node consul-consul-server-0.control from=10.42.2.0:15825 

What’s up with this? Why is it experiencing WAN problems with the cluster nodes are all LAN? What am I missing?

Sorry for the confusion. The servers join each other over both wan and lan so the wan addresses still need to be routable even over the lan.

I agree this seems weird, I can try and find out more info.

Just wanted to mention this is still the behavior with consul 1.11.2.

I have spent a small amount of time looking into where these messages come from, I believe the code that handles this is here, or perhaps here. But it’s not obvious to me from either of those how the WAN pool gets configured or why it is populated with LAN nodes.