How to scale a job automatically to fill capacity?

Bedouin · April 16, 2021, 5:22am

I would like always to be running a service job on my cluster, similar to Folding@Home, that scales up to make use of all available resources. I want it to be preempted when I run higher priority jobs then scale up to fill the remaining capacity again when the higher priority jobs finish.

How can I do this with Nomad?

lgfa29 · April 16, 2021, 11:57pm

Hi @Bedouin

Is this a single job that that would like to increase or decrease its resource values, or several jobs where you would modify the count number?

Bedouin · April 17, 2021, 12:50am

Hi!

I want to modify the count number automatically while keeping the resources values constant.

lgfa29 · April 20, 2021, 11:54pm

Thanks! For this you will need to run the Nomad Autoscaler. You also need to have something like Prometheus running as well in order to collect you cluster metrics.

Take a look at this guide to get a quick start on how to use the Autoscaler.

Once you are more familiar with how it works, you will need to add a scaling policy to your job. You will also want to reduce the job’s priority so other jobs are able to preempt it.

Creating your policy query might be a bit challenging. Here’s the list of metrics that Nomad emits: Metrics | Nomad by HashiCorp

I think that, in your case, you would want something like

sum(nomad_client_unallocated_memory) / <AMOUNT OF MEMORY RESERVED FOR YOUR TASK>

All things considered, your job would look something like this:

job "folding-at-home" {
  # ...
  priority = 10  # Defaults to 50.
  # ...
  group "folding-at-home" {
    # ...
    scaling {
      min = 0
      max = 10  # Adjust as necessary.
      policy {
        check "resources_available" {
          source = "prometheus"
          query  = "sum(nomad_client_unallocated_memory)/512"

          strategy "pass-through" {}
        }
      }
    }

    task "folding-at-home" {
      # ...
      resources {
        memory = 512
      }
      # ...
    }
  }
}

I know that’s a lot to take in, but let me know if anything was not clear

Bedouin · April 21, 2021, 3:48pm

Looks like I’ll use a combination of unallocated.cpu, unallocated.disk, and unallocated_memory. Thank you!

lgfa29 · April 21, 2021, 4:19pm

Yes, exactly. Sorry I forgot to mention the other resources

Topic		Replies	Views
Autoscaling policy with multiple check not behaving accordingly Nomad	2	81	May 30, 2024
Autoscaler and bounds nop scaling Nomad	0	151	July 31, 2023
Getting to know the Nomad Autoscaler Nomad	8	1194	October 27, 2020
[Nomad Autoscaler] 1 time scaling action Nomad	3	495	December 29, 2021
Nomad cluster on prem, autoscale to cloud Nomad	3	458	January 3, 2022

How to scale a job automatically to fill capacity?

Related topics