Nomad Autoscaler: how to delay scaling evaluation during allocation startup

krundru · April 22, 2023, 3:34am

We have configured a scaling policy for a Job that takes more than 100% CPU during the start-up time and stabilizes at 40% after the startup phase. The scaling policy is set to evaluate every 30s and scale if the CPU is 50%.

Due to High CPU usage during the startup, AutoScaler adds one allocation after another until it reaches the max threshold. After the app stabilizes, allocations are reduced to min size.

For a similar challenge, AWS auto scaler provides instanceWarmUp, which will delay Scaling evaluations during the start-up.

Is there a similar approach in Nomad Autoscaler, Or what is the best to resolve this?

rparikh420 · April 23, 2023, 3:32am

We are facing a similar problem in our setup, following this thread.

krundru · April 25, 2023, 1:18pm

Can someone help with this challenge?

hector.medina.cabane · April 26, 2023, 9:34am

This is quite interesting. If you need help on this the first step would be to provide us a way to reproduce the problem. I understand that with complex architectures this is impossible tho.

Could you share a sample version of your job file? (In .hcl please! lol)

hector.medina.cabane · April 26, 2023, 9:45am

I think this will make the trick.

Pay attention to cooldown and evaluation_interval attributes in the policy stanza.

job "example" {
  group "app" {
    scaling {
      min     = 2
      max     = 10
      enabled = true

      policy {
        evaluation_interval = "5s"
        cooldown            = "1m"

        check "active_connections" {
          source = "prometheus"
          query  = "scalar(open_connections_example_cache)"

          strategy "target-value" {
            target = 10
          }
        }
      }
    }
  }
}

From docs:

cooldown - A time interval after a scaling action during which no additional scaling will be performed on the resource. It should be provided as a duration (e.g.: "5s", "1m"). If omitted the configuration value policy_default_cooldown from the agent will be used.
evaluation_interval - Defines how often the policy is evaluated by the Autoscaler. It should be provided as a duration (e.g.: "5s", "1m"). If omitted the configuration value default_evaluation_interval from the agent will be used.

I think the cooldown will do the trick if you set with the appropriate value when the job has been stabilized.

Topic		Replies	Views
Understanding autoscaler: evaluation_interval and cooldown how do the play together? Nomad nomad	5	698	June 29, 2023
Clarifying Nomad Autoscaler's Target Value Strategy Plugin behaviour Nomad	0	291	March 22, 2023
Autoscaler and bounds nop scaling Nomad	0	151	July 31, 2023
Autoscaling policy with multiple check not behaving accordingly Nomad	2	81	May 30, 2024
Nomad autoscaler, cooldown process for scale-out and scale-in Nomad	0	5	May 2, 2025

Nomad Autoscaler: how to delay scaling evaluation during allocation startup

Related topics