Nomad not rescheduling allocations due to high usage on one node

Hi everyone. I have three nomad clients+servers, running nomad 0.12.3 and 0.11.2.
Everything is running fine, however Nomad does not seem to balance the allocations across the clients. I am seeing that two clients have their RAM usage at about 70% while the other always sits at 23%. And the jobs are not all system jobs or bound to a client, they can be shifted freely.
I now added another job of the system type. Nomad said that one allocation can not be placed due to:

  • Resources exhausted on 1 node
  • Dimension memory exhausted on 1 node

The node in question is one of the 70% ram usage clients. However first of all I don’t understand what exactly is meant by “resources exhausted” and “dimension memory exhausted”, which resource specifically? RAM? Disk space? Because says I have 50GB free and my job needs about 300MB, which should not cause problems.

The other thing is, why does Nomad not reschedule existing jobs so the job can be started on this client? There are plenty of possibilities.
This is the reschedule policy on all jobs:

“ReschedulePolicy”: {
“Attempts”: 0,
“Interval”: 0,
“Delay”: 30000000000,
“DelayFunction”: “exponential”,
“MaxDelay”: 3600000000000,
“Unlimited”: true

Hi @sesfre and thanks for the questions, I’ll try and answer each one of them below.

Nomad does not seem to balance the allocations across the clients

The Nomad service scheduler uses a binpacking algorithm by default which would explain the resource differences between the clients you are experiencing. You can modify this behaviour to use spread as documented within the agent configuration scheduler config section.

exactly is meant by “resources exhausted” and “dimension memory exhausted”, which resource specifically

The memory exhaustion here refers to the resources as defined within the job specification task resource stanza. Nomad fingerprints clients to understand the available resources, CPU and Memory most importantly, and maintains state of what resources have been allocated to workload. This allocated resource value differs from the actual resource usage of the underlying host.

To put this into an example:

  • The Nomad cluster contains a single client that has 100 MHz CPU and 100 MB memory available
  • A user runs a job that requests 60 MHz CPU and 70 MB memory
  • The Nomad client now has 40 MHz CPU and 30 MB memory available for scheduling of new jobs
  • If the user attempts to run another job, requesting 40 MHz CPU and 40 MB memory, the same error you have seen will be received

As I said previously, it’s important to note the allocation of resource is separate to the actual resource usage on the underlying host.

why does Nomad not reschedule existing jobs

If I understand this correctly, I believe this would relate to preemption rather than the rescheduling functionality.

I hope this helps. Please let me know if you have any follow-up questions.

jrasell and the Nomad team