"Datacenter" on the WAN. Am I subverting Consul?

Hello,

These are very early steps for me with Consul, mind.

So, I’ve spun up a ““datacenter”” cluster of 5 server agents. I’ve disabled serf_wan port (-1) on purpose, and I’m using retry_join. All the agents have regular/public IPv4 addressing (although not Internet routable), and they span 3 different continents with latencies ranging from 40 to 220ms between them, depending on the source and destination locations.

They join and the UI shows them as expected. I added a key to the KV store, and all nodes respond with the value. I bring two nodes down, and the DNS service records update accordingly. A Vault cluster is also being monitored. I have no standalone agents yet though, but so far, it looks good!

But, am I subverting Consul?! I’m asking this because all the documentation is telling me that this shouldn’t happen, that all the nodes should live in the same subnet, and that at most WAN federation can be used via Serf Wan, which is able to sync services status, but will not sync the KV store.

But so far I seem to be getting away with it, so I wonder, what kind of internals in Consul would actually work against this kind of deployment of a “datacenter” in a global WAN environment? What could go wrong here?

Your considerations would be very much appreciated.

Thanks!

One of the main reasons that isn’t recommended is latency which can lead to leadership churn. Full details can be viewed at this link. Are you using Consul Enterprise or Open Source?

For ease of reference, his seems like the most relevant section from that link for your question.

The value of raft_multiplier is a scaling factor and directly affects the following parameters:

Param Value
HeartbeatTimeout 1000ms default
ElectionTimeout 1000ms default
LeaderLeaseTimeout 500ms default

By default, Consul uses a scaling factor of 5 (i.e. raft_multiplier: 5 ), which results in the following values:

Param Value Calculation
HeartbeatTimeout 5000ms 5 x 1000ms
ElectionTimeout 5000ms 5 x 1000ms
LeaderLeaseTimeout 2500ms 5 x 500ms

NOTE Wide networks with more latency will perform better with larger values of raft_multiplier .

The trade off is between leader stability and time to recover from an actual leader failure. A short multiplier minimizes failure detection and election time but may be triggered frequently in high latency situations. This can cause constant leadership churn and associated unavailability. A high multiplier reduces the chances that spurious failures will cause leadership churn but it does this at the expense of taking longer to detect real failures and thus takes longer to restore cluster availability.

Hi. Thanks for jumping in @DerekStrickland.

Using open source version. Any particular limitation of the OSS version versus Enterprise which you see as relevant for this type of scenario?

I did look into that setting, as I reviewed it also for Vault using integrated Raft, and thought the default was good enough. There may be situations where local network hiccups may cause delays beyond these timeouts, but I don’t foresee it happening in more than two locations at the same time (hence the 5 nodes). Let’s see how it behaves. I’ll also put in place monitoring for the telemetry leadership metrics.

One thing I was wondering too was if Consul has hard-coded settings based on the type of subnets of the IPv4 addressing. In other words, if it would handle things differently depending on wether the agent address is “IANA private” or not.

Regards.

No limitations that I am aware of. I was just thinking that if you were running Enterprise you should check with your TAM to see if that configuration is supported. For now, I’d just monitor the leadership election, which you are already doing.

Regarding public IPs, this thread indicates it’s allowable but would require you to bind to that IP address explicitly.

1 Like

“It does not differentiate between public or private addresses”. Yep, that’s it :slight_smile:

Thanks a lot Derek, you were most helpful!

Kind regards