Hello, everyone! I’d like some help with adjusting the scaling policy for one of my Nomad jobs. Here’s the current configuration:
"Scaling": {
"ID": "-",
"Type": "horizontal",
"Target": {
"Namespace": "default",
"Job": "job_name",
"Group": "group1"
},
"Policy": {
"check": [
{
"max_messages_allowed": [
{
"source": "prometheus",
"strategy": [
{
"threshold": [
{
"lower_bound": 0,
"upper_bound": 23500,
"delta": 2
}
]
}
],
"group": "group1",
"query": "my working query"
}
]
},
{
"backlog_queue": [
{
"group": "group1",
"query": "another working query",
"source": "prometheus",
"strategy": [
{
"target-value": [
{
"max_scale_up": 2,
"target": 15000,
"threshold": 0.1,
"max_scale_down": 2
}
]
}
]
}
]
}
],
"cooldown": "4m",
"evaluation_interval": "1m"
},
Let’s say the current metric for my “max messages allowed” check is 12000, and for my “backlog queue” check, it’s close to 0 (around 100-500). Based on my understanding of the policies, this should trigger a downscale. The first check, “max messages allowed,” falls within the threshold, so it will return NONE. The second query, the “backlog check,” would have a small factor, resulting in it returning DOWN. NONE + DOWN = DOWN, so it should prompt a downscale.
I’ve verified both queries on Prometheus and Grafana, and they return accurate values. However, my policy is causing the job to scale up. Here’s what the logs show, and I’m a bit lost:
I’ve changed the names of groups, job names, and queries for privacy, but they match how they’re coded (both checks are under the same group).
If anyone spots where I might have misunderstood something, I’d greatly appreciate it!