If we are running a Nomad cluster with VM machines and need to auto scale the docker applications should i need to configure auto scaler on the servers or should i need to auto scaler on cluster as a separate job
Hi @ctr0306,
Both options are possible, but running the Autoscaler as a Nomad job is usually easier. Here’s a sample job from our horizontal application scaling demo.
Hi @ lgfa29
Thanks a lot for your reply. But i have a question here.
I have 10 VM machines as clients to nomad master
So If i want to run auto scaling for an application (docker) i need to run auto scaler on all 10 machines as a docker container or is it fine if i run one auto scaler parallel to my 10 VM machines
How can i manage auto scaling of my application to 10 VM machines
Thanks
ctr0306
You only need one Autoscaler, and it doesn’t matter how many VMs you have. You will run the Autoscaler as a Nomad job, so it be scheduled in one of those VMs.
Once you have it running, you can update your Docker job that you want to Autoscale with a scaling
block to define its policy.
Hi @lgfa29
Thanks a lot. If that is the case should i need to write any hcl file for example autoscaling.hcl
how to bind autoscaler with nomad server
@lgfa2a
configured autoscaler.hcl as below and ran it as ./nomad-autoscaler agent --config /etc/autoscaler.hcl
got the error as 2021-02-09T16:02:06.006Z [ERROR] agent: failed to setup HTTP getHealth server: error=“could not setup HTTP listener: listen tcp nomad-server-ip:9999: bind: cannot assign requested address”
http {
bind_address = “nomad-server-ip”
bind_port = 9999
}
nomad {
address = “http://nomad-server-ip:4646”
}
apm “prometheus” {
driver = “prometheus”
config = {
address = “http://prometheus-server-ip:9090”
}
}
strategy “target-value” {
driver = “target-value”
}
This error indicates that the Autoscaler can’t listen on port 9999 of nomad-server-ip
. Are you using a real IP address as bind_address
? And is port 9999 being used by some other process?
bind_address
should the IP of the host (it will default to 127.0.0.1
, so you normally wouldn’t have to change it). The bind_port
should be a port that is not being used in the host.
Hi @lgfa
Sorry for not responding… i have been away on personal reasons.
your suggestion worked but now i am getting the below errors while auto scaling. could you please suggest
Feb 18 15:42:51 f989b069-58cb-65f9-b212-ac251eb10eef nomad-autoscaler[913440]: 2021-02-18T15:42:51.075Z [INFO] policy_eval.broker: eval nack’d, retrying it: eval_id=098e43b3-4985-ca57-9039-66027a544941 policy_id=44d25ac2-9069-768d-2d9a-a87dc8202f20 token=eb4738b5-806a-3ac0-231f-977148901c54
Feb 18 15:42:51 f989b069-58cb-65f9-b212-ac251eb10eef nomad-autoscaler[913440]: 2021-02-18T15:42:51.080Z [INFO] policy_eval.worker.check_handler: scaling target: check=uptime id=4fc5b8e9-566d-713e-571d-0bd9c9480fff policy_id=44d25ac2-9069-768d-2d9a-a87dc8202f20 queue=horizontal source=prometheus strategy=target-value target=nomad-target from=3 to=2 reason=“capped count from 1 to 2 to stay within limits” meta=“map[nomad_autoscaler.count.capped:true nomad_autoscaler.count.original:1 nomad_autoscaler.reason_history:[scaling down because factor is 0.277778 scaling down because factor is 0.277778] nomad_policy_id:44d25ac2-9069-768d-2d9a-a87dc8202f20]”
Feb 18 15:42:51 f989b069-58cb-65f9-b212-ac251eb10eef nomad-autoscaler[913440]: 2021-02-18T15:42:51.085Z [ERROR] policy_eval.worker.check_handler: failed to submit scaling action to target: check=uptime id=4fc5b8e9-566d-713e-571d-0bd9c9480fff policy_id=44d25ac2-9069-768d-2d9a-a87dc8202f20 queue=horizontal source=prometheus strategy=target-value target=nomad-target error=“failed to scale group /: Unexpected response code: 400 (job scaling blocked due to active deployment)”
Feb 18 15:42:51 f989b069-58cb-65f9-b212-ac251eb10eef nomad-autoscaler[913440]: 2021-02-18T15:42:51.085Z [ERROR] policy_eval.worker: failed to evaluate policy: eval_id=098e43b3-4985-ca57-9039-66027a544941 eval_token=3050cfab-fe60-ee1d-ef4c-0b1012154771 id=4fc5b8e9-566d-713e-571d-0bd9c9480fff policy_id=44d25ac2-9069-768d-2d9a-a87dc8202f20 queue=horizontal err=“failed to scale target: failed to scale group /: Unexpected response code: 400 (job scaling blocked due to active deployment)”
Feb 18 15:42:51 f989b069-58cb-65f9-b212-ac251eb10eef nomad-autoscaler[913440]: 2021-02-18T15:42:51.085Z [WARN] policy_eval.broker: eval delivery limit reached: eval_id=098e43b3-4985-ca57-9039-66027a544941 policy_id=44d25ac2-9069-768d-2d9a-a87dc8202f20 token=3050cfab-fe60-ee1d-ef4c-0b1012154771 count=2 limit=2
Feb 18 15:43:01 f989b069-58cb-65f9-b212-ac251eb10eef nomad-autoscaler[913440]: 2021-02-18T15:43:01.066Z [WARN] policy_manager.policy_handler: failed to get target status: policy_id=44d25ac2-9069-768d-2d9a-a87dc8202f20 error=“Unexpected response code: 500 (No path to region)”
Feb 18 15:48:11 f989b069-58cb-65f9-b212-ac251eb10eef nomad-autoscaler[913440]: 2021-02-18T15:48:11.051Z [WARN] policy_manager.policy_handler: failed to get target status: policy_id=44d25ac2-9069-768d-2d9a-a87dc8202f20 error=“Unexpected response code: 500 (No path to region)”
Feb 18 15:48:21 f989b069-58cb-65f9-b212-ac251eb10eef nomad-autoscaler[913440]: 2021-02-18T15:48:21.050Z [WARN] policy_manager.policy_handler: failed to get target status: policy_id=44d25ac2-9069-768d-2d9a-a87dc8202f20 error=“Unexpected response code: 500 (No path to region)”
No worries @ctr0306, we are to help at any time
This error message indicates that your jobs might be running in a different region, so you will need to configure the Autoscaler to connect to that specific region. You can do this in the Autoscaler configuration file, using the region
parameter inside the nomad
block.
Hi @lgfa29,
Thanks a lot for all your help. Now i am able to achieve auto scale based on uptime query and the query is ==========> query = “avg(up{job=“nomad_node_exporter”})”
If i try to autoscale based on the number of allocations i am failing here my query is ============> query = “avg(nomad_client_allocations_running{job=“nomad”})”
Please correct me if I am doing anything wrong in using the query nomad_client_alloations_running.
group “test” {
count = 3
constraint {
attribute = “${node.class}”
value = “CTR”
}
scaling {
enabled = true
min = 2
max = 4
policy {
cooldown = “20s”
check “uptime” {
source = “prometheus”
query = “avg(nomad_client_allocations_running{job=“nomad”})”
query = “avg(up{job=“nomad_node_exporter”})”
strategy “target-value” {
target = 0.5
target = 1.2
}
}
}
}
Thanks
ctr0306
Hum…It’s kind of hard to tell what’s wrong without more details, like logs or error messages. What are you seeing as the failure?
From the job snippet I see a few things:
- the query should escape the inner quote, so something like this
query = “avg(nomad_client_allocations_running{job=\“nomad\”})”
, but I am not sure if this was just the HTML formatting - there are 2
query
and 2target
in the samecheck
. That’s not a valid policy since each policy should only have one of each percheck
. If you have 2 metrics you will need 2check
blocks.
Hi @lgfa29,
i will get back to you with error logs but what should be general query i need to use to scale up or scale down based on “nomad_client_allocations_running” in prometheus