I have a Nomad 3 server cluster in Prod and the same thing in DR. The Prod and DR are active/active. So overall it’s a 6 node cluster.
We recently had a failure of one of the 2 sites, and realized that Nomad could not reach a consensus, as it need 4 of 6 servers.
We easily worked around this, but I would rather not have a manual action.
Is there an easy way to configure Nomad to automatically adjust to take the consensus from the 3 servers when 3 of 6 servers fail?
The nature of Raft consensus is that you need an odd number of nodes in the decision making set. It will continue to operate while more than half are available. For example if you have an odd number and you split a cluster in two, there will always be one side with more than half, so that side knows it can continue to make decisions, the other side knows it can’t. Have an even number, and that no longer works. You can split it equally in two and neither side can guarantee that it has the quorum it needs to make decisions, so nothing will get done.
So the intention of Raft is that you have typically 3 or 5 nodes in the decision making set, and do you actually inherently have active-active-active DR (or active-active-active-active-active). You therefore don’t need, indeed aren’t intended, to have 3 nodes and then duplicate that in an active-active setup.
Raft based things work well in cloud environments like AWS where it’s easy to set up three nodes in separate availability zones and have low latency links. They seem less sell suited to a more traditional data centre scenario where you might have two data centres and want to configure an active-active setup. I’ve been pondering the same problem, and so far haven’t come up with a way of utilising Nomad in that kind of environment. If you split it 2 nodes in one and 1 node in the other, then it’ll be fine if correct data centre goes, but not if the wrong one goes, kind of defeating the object. I’ll be interested in any suggestions or links to how to make use of Nomad or other consensus based solutions in a traditional two centre active-active scenario.