Hi everyone,
first thanks all for help.
We’re encountering an imbalance issue within our Nomad cluster, comprising 3 servers and 5 clients. The problem revolves around uneven task allocation across our client nodes. Some clients are operating at full capacity (100%), while others are underutilized (10%). When a client reaches full capacity, it becomes frozen and unavailable in the cluster, resulting in the loss of running tasks.
We’re seeking insights into the root cause of this balance issue.
Regards,
Having issue of task allocation imbalance across client nodes can be multiple factors with my understanding:
- verify if the tasks are being constrained to specific nodes due to tagging or constraints in the job spec, so check the
constraint
or affinity
block in the job spec.
- sometimes it’s possible tasks being scheduled requires more resources than what is available on the underutilized nodes, and/or resources requirement in the job is too high. so review all
resources
block in the job spec.
- Ensure you have correct autoscaler config that matches with your deployment model.
- Review scheduling strategy, the default is binpack, you can also consider spread strategy which will distribute tasks more evenly across your cluster.
I hope this help.