AWS Autoscaling based on batch requested CPU metrics

pdilyard · October 17, 2023, 5:15pm

We have a cluster in which developers can launch arbitrary batch jobs. The jobs require varying resources, and we want to autoscale up and down based on instances that best fit these resources.

In EKS, the way this worked was that the autoscaler was able to see the CPU and Memory that was requested by the job, and also of the instance type in the ASG. Then, it would scale up to add the number of instances required to schedule all the jobs. For example, say you submitted 4 jobs requesting 8 CPUs each. The autoscaler could add a single 32 CPU instance to schedule all 4 jobs.

In Nomad, I’ve tried two ways of achieving this but neither of them seem to work for my use-case:

Following the “On-demand Batch Job Cluster Autoscaling” documentation. The problem is that this creates a 1:1 mapping between the number of instances and the number of jobs - which is not what I’m looking for.
Autoscaling based on CPU allocated vs. CPU available. The problem with this is that the amount of CPU allocated doesn’t change until an allocation is created, which means that my queued jobs either stay queued, or at best the cluster only scales up 1 instance at a time. In our scenario, developers might submit 50 jobs simultaneously that requires scaling the cluster out to 1, 25, or 50 instances depending on resource requirements.

Is there another way to approach this?

Topic		Replies	Views
Nomad Autoscaler for heterogeneous batch workloads Nomad	1	302	October 17, 2023
Getting to know the Nomad Autoscaler Nomad	8	1189	October 27, 2020
Allocation scaling up to max limit and not scaling down when using Autoscaler and Nomad as apm Nomad	1	276	July 10, 2023
Nomad Autoscaler: how to delay scaling evaluation during allocation startup Nomad	4	247	April 26, 2023
Define maximum concurrent instances for tasks Nomad	4	1981	April 21, 2021

AWS Autoscaling based on batch requested CPU metrics

Related topics