Start job allocations atomically

HadarGreinsmark · May 11, 2023, 12:42pm

I’m trying to start a Nomad batch job that runs two containers on two different hosts. My job consists of two task groups with the constraint distinct_hosts = true to force the containers to run on the different hosts.

Problem: I would like these containers to start simultaneously. If one of the host’s capacity is full, the entire job should stay in pending state until all resources are available to schedule both containers. When I try this, it seems like one of the task starts running and the other stays in the “queued” state if one of the host is full.

I also tried to set the job parameter all_at_once = true. This didn’t help and the documentation also states that it cannot be used for “atomic placement”. The outline of my job:

job "parallel-work" {
  type = "batch"
  all_at_once = true
  constraint {
      operator  = "distinct_hosts"
      value     = "true"
  }
  # Both work1 and work2 should start about the same time
  group "work1" {
    task "main" {
      # To be executed on host A
    }
  }
  group "work2" {
    task "main" {
      # To be executed on host B
    }
  }
}

Is there another way to achieve this or any plans on implementing such functionality?

shantanugadgil · May 11, 2023, 7:30pm

Interesting use case … could there be poststart tasks which could kill the main task if the other task is not healthy !?

Maybe the health of the other task can be determined using something like dig or nslookup ?

just-a-thought

HadarGreinsmark · May 12, 2023, 5:58am

Sound like an interesting approach. In my use case, there might be many parallel-work batch jobs in the queue waiting to be started. I guess one challenge could be a scenario where parallel-work1 has a task running on host A and a parallel-work2 has another task running on host B. Thus, no one can make progress if both needs two hosts. I guess, such cases could be dealt with by sleeping a random amount before checking the health of all expected tasks, so that one of the job would be killed and the other one could make progress. Though the question is how performant such an approach is if a batch job needs many nodes simultaneously and many other jobs are waiting the queue.

Best would be if it could be detected at the scheduling stage, but Nomad is maybe not primarily designed for such batch-heavy workloads.

HadarGreinsmark · August 12, 2024, 1:10pm

After investigating multiple schedulers, the algorithm I was looking for here was Gang scheduling.

Nomad doesn’t support this, neither does the default scheduler of Kubernetes. But Kubernetes has scheduler plugins in projects like Volcano and Apache YuniKorn that make Gang scheduling possible. Also, HPC schedulers like Slurm support it.

Topic		Replies	Views
Distinct host for batch jobs Nomad	2	36	March 10, 2025
Ensuring two jobs cannot run on the same node Nomad	7	2090	February 14, 2022
Schedule tasks on the same node, but configure them independently Nomad	9	35	December 5, 2024
Limit nomad job startup concurrency (max jobs spawning at once) Nomad	0	569	September 30, 2019
Job with 1 task, how to run no more than 2 instances per client node? Nomad jobs	3	653	August 1, 2023

Start job allocations atomically

Related topics