Autoscaling policy with multiple check not behaving accordingly

Huasito-Appel · May 23, 2024, 4:29pm

Hello, everyone! I’d like some help with adjusting the scaling policy for one of my Nomad jobs. Here’s the current configuration:

      "Scaling": {
        "ID": "-",
        "Type": "horizontal",
        "Target": {
          "Namespace": "default",
          "Job": "job_name",
          "Group": "group1"
        },
        "Policy": {
          "check": [
            {
              "max_messages_allowed": [
                {
                  "source": "prometheus",
                  "strategy": [
                    {
                      "threshold": [
                        {
                          "lower_bound": 0,
                          "upper_bound": 23500,
                          "delta": 2
                        }
                      ]
                    }
                  ],
                  "group": "group1",
                  "query": "my working query"
                }
              ]
            },
            {
              "backlog_queue": [
                {
                  "group": "group1",
                  "query": "another working query",
                  "source": "prometheus",
                  "strategy": [
                    {
                      "target-value": [
                        {
                          "max_scale_up": 2,
                          "target": 15000,
                          "threshold": 0.1,
                          "max_scale_down": 2
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ],
          "cooldown": "4m",
          "evaluation_interval": "1m"
        },

Let’s say the current metric for my “max messages allowed” check is 12000, and for my “backlog queue” check, it’s close to 0 (around 100-500). Based on my understanding of the policies, this should trigger a downscale. The first check, “max messages allowed,” falls within the threshold, so it will return NONE. The second query, the “backlog check,” would have a small factor, resulting in it returning DOWN. NONE + DOWN = DOWN, so it should prompt a downscale.

I’ve verified both queries on Prometheus and Grafana, and they return accurate values. However, my policy is causing the job to scale up. Here’s what the logs show, and I’m a bit lost:

I’ve changed the names of groups, job names, and queries for privacy, but they match how they’re coded (both checks are under the same group).

If anyone spots where I might have misunderstood something, I’d greatly appreciate it!

Huasito-Appel · May 23, 2024, 6:57pm

Update, notice that i may be using wrongly the threshold strategy, so change it to :

        "Policy": {
          "check": [
            {
              "max_messages_allowed": [
                {
                  "source": "prometheus",
                  "strategy": [
                    {
                      "threshold": [
                        {
                          "lower_bound": 22000,
                          "upper_bound": 23500,
                          "delta": -2
                        }
                      ]
                    }
                  ],

however for some reason the threshold policy seems to be stepping up the target-value one, so will result on having 2 target value and using this resolution strategy.
kinda sad because wanted to use the threshold.

austinbenz8282 · May 30, 2024, 11:28am

Sure thing! Have you considered adjusting the threshold values for each check halloween squish in your autoscaling policy? That might help align the behavior more closely with your expectations.

Topic		Replies	Views
Multiple Autoscaler threshold strategies never scale in Nomad	2	580	March 9, 2022
Autoscaler and bounds nop scaling Nomad	0	151	July 31, 2023
Nomad Autoscaler: how to delay scaling evaluation during allocation startup Nomad	4	247	April 26, 2023
Clarifying Nomad Autoscaler's Target Value Strategy Plugin behaviour Nomad	0	291	March 22, 2023
How to scale a job automatically to fill capacity? Nomad	5	463	April 21, 2021

Autoscaling policy with multiple check not behaving accordingly

Related topics