I am writing a program that will schedule a lot of small and big batch jobs to execute. The count of tasks varies, but is around 3000 daily.
Is it better to split them in 3 jobs each with 1000 groups with one task each, or is it fine to schedule 3000 separate jobs every job with one group and one task? What is better for nomad to allocate?
In my program, for job generation and monitoring, doing 3000 separate jobs is easier to manage and especially easier to edit job specifications - I can stop and edit one task, without affecting other tasks. However, I fear that nomad will lag with too many jobs, is it right?
All “tasks” in the same job “group” are always scheduled to run on the same node.
Would recommend scheduling multiple jobs, as it will most likely improve the resource distribution across your cluster.
You could split tasks into multiple groups, but personally I prefer keeping job-files as small as I possible can.
Hi, thx! Yes, agree, exactly. So there is one job with 1000 groups per 1 task, or 1000 jobs with 1 group with 1 task.
Yes. 10/10 times I would do the 1000 jobs.
I have previously had multiple related groups in the same jobfile, as I thought it would look better/cleaner, but it is just a lot easier to handle changes when I split them into seperate jobs (my personal opinion).
Hi @Kamilcuk ,
I just wanted to add a note from the scheduling side of this problem.
Scheduling 3k groups in a single job and therefore a single evaluation will use up a single worker for a long time. If there are a lot of other workloads in-flight, it’s plans are more likely to conflict, however, this depends on how much other work is happening on the cluster, along with other factors such as packed ratio of the cluster.
Scheduling 3k jobs at the same time and therefore 3k evaluations spreads the work out among the workers, but if those plans are likely to collide (depends on the workload) then that’ll be more likely to have rejected plans. That also eats up all the workers for a while, which can crowd out scheduling of other work.
Seeing as these 3k will be distributed throughout a day, I believe from the scheduling side options two would be the better choice.
jrasell and the Nomad team
Hi, thank you! What is a “worker”? Is it a nomad node or something in nomad server? I do not know that term.
Because there are 3000 groups of 1 tasks, the allocations number is going to be the same, they are going to be distributed over multiple hosts the same way as if there were 3000 jobs of one group of one task.
So in 1 big job there will be fewer evaluations and deployments. But then, when one group is finished execution and another is still executing, and when I add a node to nomad, that will trigger new evaluation of that job. Will the finished group in a running batch job restart because of a new evaluation caused by adding a new node to the cluster?
However, my experience shows me that scheduling 3000 jobs per 1 group per 1 task is severally less memory and cpu exhausting then 1 job per 3000 groups per 1 tasks.
With 1 big job, I had many issues with nomad server memory skyrocketing to 200 gigabytes. After switching to 3000 small jobs, the memory server usage remains the same and is negligible. And I can clean up more selectively with purge the jobs keeping it clean earlier.
Ergo, because 1 job per 1 group per 1 task gives way more control and looks like it is way less memory expensive for nomad server, I will prefer to use that method for automatically scheduling many batch jobs.