Nomad Group and Task question

scripthelp123 · March 23, 2021, 9:55pm

Hi all I have a question about group and tasks.

Say I have a group and I set the count of the group equal to 5 so there are 5 instances of this group running. Inside of this group is a task whose job is to say iterate a counter from 0 to 100000000000. Is there a way that for each instance of the group I pass different parameters to the task in the group so they start iterating the counter from a set number so I can do the work in parallel? Ex: the first instance starts at 0, the second instance starts at 20000000000, the third 40000000000, etc…

Is this possible with nomad?

lgfa29 · March 24, 2021, 3:17pm

Hi @scripthelp123

No, that wouldn’t be possible without some extra coordination at the application level. In Nomad, each instance of a group (an allocation) looks the same since they are derived from the same jobspec file.

I think what you are looking for is a parameterized job. Here’s an example:

job "counter" {
  datacenters = ["dc1"]
  type        = "batch"

  parameterized {
    meta_required = ["start"]
  }

  group "counter" {
    task "counter" {
      driver = "docker"

      config {
        image   = "alpine:3.13"
        command = "count.sh"
        volumes = [
          "local/count.sh:/usr/bin/count.sh",
        ]
      }

      template {
        data        = <<EOF
#!/bin/sh

count=${NOMAD_META_start}
while [ $count -lt 100000 ]
do
  echo $count
  count=$((count + 1))
done
        EOF
        destination = "local/count.sh"
        perms       = "777"
      }
    }
  }
}

You can then register it:

$ nomad run counter.nomad
Job registration successful

And dispatch how many instances you need with different inputs:

$ nomad job dispatch -meta start=0 counter

Dispatched Job ID = counter/dispatch-1616598843-6b557dbb
Evaluation ID     = 8f3a4dfd

==> Monitoring evaluation "8f3a4dfd"
    Evaluation triggered by job "counter/dispatch-1616598843-6b557dbb"
    Allocation "222d3e08" created: node "d710c448", group "counter"
==> Monitoring evaluation "8f3a4dfd"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "8f3a4dfd" finished with status "complete"

$ nomad job dispatch -meta start=1000 counter

Dispatched Job ID = counter/dispatch-1616598838-e761f7c3
Evaluation ID     = f4356734

==> Monitoring evaluation "f4356734"
    Evaluation triggered by job "counter/dispatch-1616598838-e761f7c3"
    Allocation "10482406" created: node "d710c448", group "counter"
==> Monitoring evaluation "f4356734"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "f4356734" finished with status "complete"

shantanugadgil · March 28, 2021, 3:47pm

just and observation @lgfa29 : the task/group instances being able to determine which "N"th instance they are within the specified count would indeed be a useful functionality. I am not sure if there is a Github issue asking for such a feature, but just saying!

lgfa29 · March 29, 2021, 2:09pm

I didn’t find any request for this on GitHub @shantanugadgil, and I don’t remember hearing about it either. I think the main challenge with this is defining what the Nth allocation even means

Imagine you start a job that has a group with count: 1, this will create an allocation with ID aaaaa. Then you increase to count: 2, which creates a new alloc bbbbb. If alloc aaaaa gets reschedule (its client went away, or it got OOM’ed) and becomes cccccc.

Is the new alloc the 3rd one in a group with count 2? Or is it still the 1st? Or maybe it’s the 2nd and bbbbb is now the 1st?

So I am not sure if there is any inherit order for allocations. Something like the CreateIndex seems like a good order, but then each allocation position would change dynamically as rescheduling occur.

Do you have an example or use case where this would be helpful?

shantanugadgil · March 31, 2021, 5:42am

@lgfa29 My thinking about the Nth would be a number; treating the group as an array.
This thinking is purely from an HPC (SGE/PBSPro) way of thinking.
Having a predictable number would also ease a bunch of other workflows like names, ids in config files, etc.

I don’t have a pressing need for this immediately, but, just a thought.

lgfa29 · March 31, 2021, 3:28pm

Right, but the “problem” (I think it’s a problem, not sure ) is that the value of N in this array of allocs for a group would change when allocs are rescheduled for some reason.

Even if you don’t have an immediate use case it’s always valuable to discuss ideas like this, so thank you for bringing it up

shantanugadgil · April 2, 2021, 7:33pm

More “just thoughts”: This could also be helpful to mimic “stateful” type workloads, where the Nomad group could say have a prestart task and the main task. Even if the “Nth” instances moves around on random machines, the prestart task could do some necessary “bootstrap” for the main task based on its ID.

After my above post, I happened to delve into trying to setup ScyllaDB and was thinking that (maybe?) this could be useful for the “seed nodes” of ScyllaDB.

If the instance 0 was able to publish that information (that it is instance 0) as a Consul tag/meta information, then the other instances could use that information for doing some startup “seed” work.

Again, this is all “up in the air”, with no concrete use case/job file created.

Topic		Replies	Views
Is it possible to execute a single task within a group sequentially based on the count? Nomad task-dependencies	1	358	July 29, 2022
Batch job spread with single batch group and multiple Nomad clients Nomad	1	512	August 16, 2022
How to loop across tasks in nomad group? Nomad	0	209	March 29, 2024
Nomad, run different instances of the same task in sequence Nomad	1	771	January 13, 2021
Question about group `count` and client allocation Nomad	3	1092	May 2, 2022

Nomad Group and Task question

Related topics