Why nomad ignores memory utilisation

Why is Nomad Agent ignoring memory usage outside of Nomad control on the host where it schedules jobs?

I understand it’s intended to reserve memory for the OS/host with the reserved memory stanza in the agent configuration. So the Nomad Agent schedules jobs based on memory allocated by Nomad, but ignoring if processes outside of Nomad are using (too) much memory.

When looking at Prometheus metrics supplied by the Nomad Agent, the agent actually knows the real amount of memory that is available on the client it’s scheduling to.

So why not schedule jobs based on memory usage and actual real free memory on a client?

By ignoring the real actual usage (outside of Nomad) it creates a whack-a-mole situation where cluster operators are having to keep an eye on OS/host process memory usage and if the reserved memory on the client needs adjusting. Which I find very silly, if that can be detected automatically.

Hi @rbastiaans-tc,

You’re right in suggesting Nomad could detect real-time host memory usage, however, the problem is what to do with this data and the frequency at which it would need to happen.

In a distributed cluster running 3 servers and a number of clients, each client would be responsible to monitoring usage on a sub-second interval. Due to Nomad’s optimistically concurrent scheduling algorithm, all Nomad agents running in server mode would need to have this usage information available. This means the client must send an RPC with the usage data, which is then replicated via Raft to each server. This type of architecture results in network IO, disk IO, and CPU saturation to a level where this data collection and sharing would consume far more resources than scheduling even on small clusters. This is therefore why Nomad relies on resource reservations, rather than realtime OS metrics.

Thanks,
jrasell and the Nomad team