Batch job spread with single batch group and multiple Nomad clients

I’ve read TFM and a thread that’s similar to my question, but it’s not quite the same.

job "batch" {
  datacenters = ["dc1"]
  type = "batch"
  parameterized {
    ...
  }
  group "batch_group" {
    count = 1
    task "batch_task" {
...

Each tasks consist of a shell script that takes an input parameter.

If I set spread on job level, batch_group is only one, so Nomad doesn’t get to spread it. I have two clients, but count = 1 so all tasks that get submitted to this job end up on the same Nomad client.

If I set group.count = 2 and set spread on that level, every task results in two groups, each group on one of the clients, but because each (parameterized) task is just one, one of the batch_groups ends up without any task (which doesn’t bother me much) but also results in unnecessary resource allocation and pollutes the log.

I thought about creating two batch jobs, each with one batch group, and force each group onto a different client, but that would make it hard to split submissions evenly and I’d have to change my scripts as well (to divide tasks in two “queues”).

Is there a better way to spread parameterized tasks around the cluster and how? I’ve been thinking if I should try group.count=2 and two tasks per each group, but that also means I have to rewrite my script to fetch two parameters at once.

I gave up on trying to figure this out. I went with:

  • one parametrized job per each Nomad client with single group per job; all jobs are the same but use different spread settings to land on different clients
  • outside of Nomad, run a script that loops through input files and submits each to one of the jobs (i.e. Nomad clients)

This can’t handle client/job failures, but I can live with that for time being.