I am trying to deploy consul on public cloud providers and trying to do a cross DC query by having the clouds join each other over wan. Many of the smaller providers like digital ocean or vultr or linode etc… does not even support UDP in their loadbalancer, and even the ones that do like Amazon have surprising limitations. For example amazon places the restriction that if the same port 8302 is used for both udp and tcp, it needs to map to the same port in the backend (what amazon calls a TCP_UDP loadbalancer target) - and that becomes hard in a kubernetes environment where k8s might allocate different nodePort for 8302 in tcp and udp. So in summary, UDP presents a lot of headaches.
And I have seen that unless consul can talk on UDP 8302, it does not do a successful wan join. My question is - is there any way by which I can tell consul that for serf-wan, just use TCP and dont use UDP ? I understand it might become sub-optimal and all that, but thats better for me compared to it not working at all !
I could not find any such option in the consul docs - any suggestions are appreciated!
It is not possible to turn off UDP for typical WAN federation but there are two options. If you have a Consul Enterprise license than the easiest thing is probably to use the Network Areas feature (the docs call it Advanced Federation) which still performs gossip between pairs of datacenters but does so over the “server” port (8300 and is multiplexed with RPC traffic). If you are using the OSS version of Consul then you could try out WAN federation via Mesh Gateways. In a nutshell that would require enabling the Consul service mesh, deploying mesh gateways into each datacenter and configuring those datacenters to federate via the gateways. This feature was originally envisioned as a way to reduce the network exposure of a Consul datacenter to just the gateways which can also be transitting cross-dc mesh traffic. However its implementation involves sending gossip through TLS connections and might get you close to what you are looking for.
@mkeeler Thx a lot for the quick response. Let me explore both the options you have mentioned.
Another “public cloud headache” in this same context is the -advertise-wan option in consul which I guess is mandatory for doing serf-wan. Now again, having to know ones public IP from inside the datacenter is also very ugly/difficult because the public IP of an AWS loadbalancer for example can be many (one per zone) and it can even change too, since people usually just use the dns name of the loadbalancer rather than the specific IP. So does the two options you mentioned for the UDP problem also address this - can we somehow not have to specify an advertise-wan option to be able to do WAN federation ?
Putting Consul’s HTTP APIs behind a load balancer would be perfectly fine. However putting the internal APIs behind a load balancer is going to cause issues. For gossip each node needs to be able to directly address every other node which is where the advertise address requirements come from. Otherwise the failure detection gossip provides will not work correctly and you could end up with both false-postivies and false-negatives regarding the health of the consul nodes. Consul itself should already be able to handle spreading the RPC and gossip load across the servers/members of the cluster.
To more directly answer your question, neither network areas or wan fed via mesh gateways takes away the need to specify the advertise address.
Thx again for the detailed response @mkeeler