Define maximum concurrent instances for tasks

Hi,
We are considering using Nomad to orchestrate media jobs (file upload, transcoding, etc).
One strong requirement we have is to limit the number of concurrent executions, per task type (this is to avoid saturating CPU/Memory/network/storage resources).

So we’d like to have the following:

  • define several jobs/task types (e.g.: transcode a file, upload to S3)
  • for each job/task type, set up a limit (e.g.: max 5 concurrent instances of this task. this value should be global to the region)
  • submit jobs to Nomad:
    • if the number of tasks is below maximum, schedule and execute the task immediately
    • if not, enqueue the task and wait until a task finishes, then schedule it.

We’ve looked at the attributes at the Job, Group and Task levels, but couldn’t find such a concept.

Is such a behaviour possible with Nomad ?

Thanks

Additional questions on the autoscaling part:

  • Scaling metrics

Since transcoding jobs are CPU-heavy, measuring the CPU usage on the client will not give an accurate metric to take autoscaling decision. Two other metrics would be more relevant:

  • average fps of jobs currently running in the cluster (fetched from the task by e.g.: parsing the log output) → if fps falls below a certain level, trigger autoscaling
  • length of the job queue (to be retrieved from an external API endpoint) → if the job queue goes over a certain amount, trigger autoscaling.

If I want to fetch these metrics within a check block, how can I do ? Is it possible to write custom APM plugins ? Are there other options ?

  • Cluster scaling with Vmware vSphere

If I want to scale my cluster running on Vmware VMs, could I write a Target plugin to take of node provisioning/deprovisioning ? If not, what alternative solutions do I have ?

Thanks,

Hi @jcdlt and apologies for the delay in getting round to answering your initial question.

To your initial question:

Reading your description it sounds like your use case aligns with using Nomad job dispatch, where tasks are launched as needed and on demand. The Nomad job specification includes a scaling stanza which includes max and min parameters, however, breaking these limits results in an API error rather than an enqueued job. Depending on how your node topology looks, you might be able to achieve this with careful job resource assignment. Say you have 1000 MHz CPU available on your nodes, and you want a maximum of 5 transcode jobs running at any one time, you could set the CPU resource assignment to 200. In the event where you trigger a run of 6 transcode jobs, 5 will be placed and start running, the 6th will be queued and only placed once resource becomes available.

Follow up questions:

The two metric queries you mention would probably be written into two separate checks. This way you can easily control what happens when each situation is triggered and have great control over them independently such as administratively disabling.

Yes it is possible to write custom APM plugins if you have a requirement outside using Prometheus or Datadog which are the currently supported APM tools. There is PR in progress to provide more details on this process and to document community plugins. In the meantime, using the Prometheus plugin and the APM interface code a reference should provide as good base. Also please feel free to reach out to the ecosystem team for guidance.

The same goes for writing target plugins for providers that we do not currently support. The GCP MIG and Azure VMSS target plugins where both written by community members and can be used a reference.

Please feel free to raise any feature requests, such as new plugins, against the Nomad Autoscaler repository. I hope the information I have provided helps.

Thanks,
jrasell and the Nomad team

Hi @jrasell and thank you very much for your answers.
I took the time to play around with the autoscaler and I’m starting to get a good view of how we can use it in our project.

A few additional questions:

  • What is the effect of defining several checks within the same policy ?
    Say for instance I define a check for CPU and another one for Memory, each with its target, how will the autoscaler calculate the desired instance count ? Will it average the values output by each check ?

  • Same question, this time when defining several scaling policies that apply to the same resource ?
    Example:

policy #1 checks metrics related to transcoding to codec1, and acts on Autoscaling group ASG, by scaling instances tagged with node_class #1

policy #2 checks metrics related to transcoding to codec2, and acts on Autoscaling group ASG, by scaling instances tagged with node_class #2

This way I can dedicate some instances to each codec family (using node_class), while working with only one Autoscaling group.
Would that work ? Would the autoscaler add the output of each policy to calculate the total desired count ?

  • Scale in increments
    Sometimes the autoscaler will calculate a new instance count which is too big an increment at a time (for example, going from 2 to 10 instances in one step).
    Would you consider a feature to define a step, i.e.: the maximum number of nodes that can be added/removed between two evaluations ?
    So that I would have 2 → 4 → 6 […] instead of 2 → 10.
    The idea is that by the time I reach 6 nodes, maybe the jobs queue would have already resorbed, so that going to 8 (and then 10) instances would now be unnecessary. This would avoid spinning up instances too quickly, and ultimately could save money.

Lastly, I will raise a PR to add support for Vmware vSphere as a target.

Thanks !

1 Like

Hi @jcdlt, glad you managed to find some time to dig into it.

What is the effect of defining several checks within the same policy ?

When defining multiple checks per policy, all checks are executed at the same time and then the desired result created from all. The Nomad Autoscaler will always pick the safest action out of all the check results, meaning more capacity will always be chosen. The check calculations page has some additional detail on this.

Same question, this time when defining several scaling policies that apply to the same resource ?

Each policy acts as an independent entity and is subjected to isolated workflows. This means that if you have two policies acting on the same resource, policy-a calculates a count of 3 and policy-b calculates a count of 4; the Autoscaler will trigger separate scaling events which would therefore result in flapping of the remote resource count. It is therefore not suggested to have multiple policies per target, but instead have multiple checks per policy.

In your example situation, I would personally run two autoscaling groups for each class of node you have, with a policy per ASG.

Scale in increments

There are a couple of features available or in-progress that I believe can help in the situation you have described. Firstly is the proposed threshold strategy plugin which should be available in the coming weeks. This will provide finer control over the calculated increment; if you have any feedback on the approach we would love to hear.

In order to avoid spinning up more instances too quickly you could alter the cooldown policy option to be a higher value. This means the autoscaler waits until performing new calculations after a scaling event, which would accommodate situations where the queue is emptying and the current capacity is sufficient. You could also set a policy max value to cap the number of running instances at a value to ensure cost budgets are met.

Lastly, I will raise a PR to add support for Vmware vSphere as a target.

Firstly, thank you so much for taking the time to write a new plugin. As a team, it is amazing to see people contributing and adopting the Nomad Autoscaler into varying environments. That being said, at the present time we are unable to accept further in-tree target plugins and prefer to keep these external. We do not have the capacity currently to support and maintain the additional code, along with the testing infrastructure we would need to provide the level of confidence and support we would desired. Additional information can be found within our contributing page.

Please let me know if you have any follow up questions or any feedback.

Thanks,
jrasell and the Nomad team

1 Like