Why nomad ignores memory utilisation

rbastiaans-tc · August 9, 2023, 12:32pm

Why is Nomad Agent ignoring memory usage outside of Nomad control on the host where it schedules jobs?

I understand it’s intended to reserve memory for the OS/host with the reserved memory stanza in the agent configuration. So the Nomad Agent schedules jobs based on memory allocated by Nomad, but ignoring if processes outside of Nomad are using (too) much memory.

When looking at Prometheus metrics supplied by the Nomad Agent, the agent actually knows the real amount of memory that is available on the client it’s scheduling to.

So why not schedule jobs based on memory usage and actual real free memory on a client?

By ignoring the real actual usage (outside of Nomad) it creates a whack-a-mole situation where cluster operators are having to keep an eye on OS/host process memory usage and if the reserved memory on the client needs adjusting. Which I find very silly, if that can be detected automatically.

jrasell · August 11, 2023, 7:37am

Hi @rbastiaans-tc,

You’re right in suggesting Nomad could detect real-time host memory usage, however, the problem is what to do with this data and the frequency at which it would need to happen.

In a distributed cluster running 3 servers and a number of clients, each client would be responsible to monitoring usage on a sub-second interval. Due to Nomad’s optimistically concurrent scheduling algorithm, all Nomad agents running in server mode would need to have this usage information available. This means the client must send an RPC with the usage data, which is then replicated via Raft to each server. This type of architecture results in network IO, disk IO, and CPU saturation to a level where this data collection and sharing would consume far more resources than scheduling even on small clusters. This is therefore why Nomad relies on resource reservations, rather than realtime OS metrics.

Thanks,
jrasell and the Nomad team

Topic		Replies	Views
Strange problem with Nomad allocating jobs that use up more memory than the machine actually has Nomad	0	81	March 20, 2025
Client memory reservation is not used for job allocation Nomad	1	225	October 30, 2023
Tips for running nomad in resource-constrained environments? Nomad	8	2436	September 30, 2021
Nomad not rescheduling allocations due to high usage on one node Nomad	2	4357	March 8, 2021
Nomad memory graph is showing high memory uses than the process is actually using Nomad	1	2043	April 7, 2020

Why nomad ignores memory utilisation

Related topics