Start job allocations atomically

I’m trying to start a Nomad batch job that runs two containers on two different hosts. My job consists of two task groups with the constraint distinct_hosts = true to force the containers to run on the different hosts.

Problem: I would like these containers to start simultaneously. If one of the host’s capacity is full, the entire job should stay in pending state until all resources are available to schedule both containers. When I try this, it seems like one of the task starts running and the other stays in the “queued” state if one of the host is full.

I also tried to set the job parameter all_at_once = true. This didn’t help and the documentation also states that it cannot be used for “atomic placement”. The outline of my job:

job "parallel-work" {
  type = "batch"
  all_at_once = true
  constraint {
      operator  = "distinct_hosts"
      value     = "true"
  }
  # Both work1 and work2 should start about the same time
  group "work1" {
    task "main" {
      # To be executed on host A
    }
  }
  group "work2" {
    task "main" {
      # To be executed on host B
    }
  }
}

Is there another way to achieve this or any plans on implementing such functionality?

Interesting use case … could there be poststart tasks which could kill the main task if the other task is not healthy !?

Maybe the health of the other task can be determined using something like dig or nslookup ?

just-a-thought

Sound like an interesting approach. In my use case, there might be many parallel-work batch jobs in the queue waiting to be started. I guess one challenge could be a scenario where parallel-work1 has a task running on host A and a parallel-work2 has another task running on host B. Thus, no one can make progress if both needs two hosts. I guess, such cases could be dealt with by sleeping a random amount before checking the health of all expected tasks, so that one of the job would be killed and the other one could make progress. Though the question is how performant such an approach is if a batch job needs many nodes simultaneously and many other jobs are waiting the queue.

Best would be if it could be detected at the scheduling stage, but Nomad is maybe not primarily designed for such batch-heavy workloads.