Say I have a group and I set the count of the group equal to 5 so there are 5 instances of this group running. Inside of this group is a task whose job is to say iterate a counter from 0 to 100000000000. Is there a way that for each instance of the group I pass different parameters to the task in the group so they start iterating the counter from a set number so I can do the work in parallel? Ex: the first instance starts at 0, the second instance starts at 20000000000, the third 40000000000, etc…
No, that wouldn’t be possible without some extra coordination at the application level. In Nomad, each instance of a group (an allocation) looks the same since they are derived from the same jobspec file.
I think what you are looking for is a parameterized job. Here’s an example:
just and observation @lgfa29 : the task/group instances being able to determine which "N"th instance they are within the specified count would indeed be a useful functionality. I am not sure if there is a Github issue asking for such a feature, but just saying!
I didn’t find any request for this on GitHub @shantanugadgil, and I don’t remember hearing about it either. I think the main challenge with this is defining what the Nth allocation even means
Imagine you start a job that has a group with count: 1, this will create an allocation with ID aaaaa. Then you increase to count: 2, which creates a new alloc bbbbb. If alloc aaaaa gets reschedule (its client went away, or it got OOM’ed) and becomes cccccc.
Is the new alloc the 3rd one in a group with count 2? Or is it still the 1st? Or maybe it’s the 2nd and bbbbb is now the 1st?
So I am not sure if there is any inherit order for allocations. Something like the CreateIndex seems like a good order, but then each allocation position would change dynamically as rescheduling occur.
Do you have an example or use case where this would be helpful?
@lgfa29 My thinking about the Nth would be a number; treating the group as an array.
This thinking is purely from an HPC (SGE/PBSPro) way of thinking.
Having a predictable number would also ease a bunch of other workflows like names, ids in config files, etc.
I don’t have a pressing need for this immediately, but, just a thought.
Right, but the “problem” (I think it’s a problem, not sure ) is that the value of N in this array of allocs for a group would change when allocs are rescheduled for some reason.
Even if you don’t have an immediate use case it’s always valuable to discuss ideas like this, so thank you for bringing it up
More “just thoughts”: This could also be helpful to mimic “stateful” type workloads, where the Nomad group could say have a prestart task and the main task. Even if the “Nth” instances moves around on random machines, the prestart task could do some necessary “bootstrap” for the main task based on its ID.
After my above post, I happened to delve into trying to setup ScyllaDB and was thinking that (maybe?) this could be useful for the “seed nodes” of ScyllaDB.
If the instance 0 was able to publish that information (that it is instance 0) as a Consul tag/meta information, then the other instances could use that information for doing some startup “seed” work.
Again, this is all “up in the air”, with no concrete use case/job file created.