Hello, I’m doing a lab for experimenting extreme issues while operating a nomad cluster.
I tried this situation:
(group1) 3 nomad servers joined together (server1, server2, server3)
(group2) 3 nomad servers joined together (server4, server5, server6)
I created some jobs on group1 and some on group2.
On a random server of a random group I executed (example):
server4# nomad server join $IPSERVER1
So I “merged” group1 with group2.
Basically I have merged two clusters in this way… Now I see jobs that was defined in group1 but all jobs on group2 are lost…
Can someone explain technically what is happened in this case that produced the loosing of the jobs on the group2?
Group2 and group1 should be the same, so I can loose potentially also jobs of group1… but does not happen in my tests…
I can’t understand the logics that are under the hood.
Yes, all servers/agents are correctly shown as members and peers in the cluster correctly and the cluster elected a new leader correctly.
(Before joining each group, they had a own leader obviously)
It is not safe to merge two Nomad Server Clusters as you won’t be able to predict the state the resultant cluster would end up with. Which data wins out of the two clusters would be purely handled by the raft protocol.
I would recommend you try to understand the Raft protocol to understand why merging the clusters isn’t safe.
ok thanks @Ranjandas , so in order to increase the number of servers in a nomad server cluster I have to join only new servers that are not already joined with others.
is it true the sentence that I wrote?
Yes you are right. But make sure that the number of server agents are 5 at the max.
The recommended configuration is to either run 3 or 5 Nomad servers per region. This maximizes availability without greatly sacrificing performance.
Ref: Consensus Protocol | Nomad | HashiCorp Developer