Join 3 servers cluster to another 3 server cluster: jobs lost

sgala · November 13, 2023, 4:24pm

Hello, I’m doing a lab for experimenting extreme issues while operating a nomad cluster.

I tried this situation:
(group1) 3 nomad servers joined together (server1, server2, server3)
(group2) 3 nomad servers joined together (server4, server5, server6)

I created some jobs on group1 and some on group2.

On a random server of a random group I executed (example):
server4# nomad server join $IPSERVER1

So I “merged” group1 with group2.

Basically I have merged two clusters in this way… Now I see jobs that was defined in group1 but all jobs on group2 are lost…

Can someone explain technically what is happened in this case that produced the loosing of the jobs on the group2?

Group2 and group1 should be the same, so I can loose potentially also jobs of group1… but does not happen in my tests…

I can’t understand the logics that are under the hood.

Thanks

Matteo

dvlpmike · November 15, 2023, 10:12am

Hi, can you show your agents in cluster configuration?

sgala · November 15, 2023, 10:37am

Yes, all servers/agents are correctly shown as members and peers in the cluster correctly and the cluster elected a new leader correctly.
(Before joining each group, they had a own leader obviously)

dvlpmike · November 15, 2023, 11:36am

Ok, but can you share configuration files and logs? It’s hard to say something without it

Ranjandas · November 18, 2023, 3:22am

Hi @sgala,

First of all, please note that this is not a supported scenario.

Each Nomad cluster uses the Raft Consensus Protocol to store its state. In your case, both group1 and group2 clusters have their distinct data.

ref: Consensus Protocol | Nomad | HashiCorp Developer

It is not safe to merge two Nomad Server Clusters as you won’t be able to predict the state the resultant cluster would end up with. Which data wins out of the two clusters would be purely handled by the raft protocol.

I would recommend you try to understand the Raft protocol to understand why merging the clusters isn’t safe.

ref: Raft

I hope this helps.

sgala · November 20, 2023, 10:08am

ok thanks @Ranjandas , so in order to increase the number of servers in a nomad server cluster I have to join only new servers that are not already joined with others.
is it true the sentence that I wrote?

Ranjandas · November 20, 2023, 12:28pm

Yes you are right. But make sure that the number of server agents are 5 at the max.

The recommended configuration is to either run 3 or 5 Nomad servers per region. This maximizes availability without greatly sacrificing performance.
Ref: Consensus Protocol | Nomad | HashiCorp Developer

Topic		Replies	Views
Nomad Cluster question and Job retry Nomad	6	437	March 26, 2025
Active - Active DR solution with Nomad Nomad	1	108	May 14, 2024
How an end user interacts with nomad Nomad	2	628	October 14, 2020
Nomad leader not accept jobs Nomad	2	332	April 20, 2022
How to remove a server node permanently from a cluster Nomad	7	6464	May 14, 2020

Join 3 servers cluster to another 3 server cluster: jobs lost

Related topics