I have background job tasks that need to exit cleanly if they need to stop. These tasks can sometimes grow in memory and should be restarted once they cross a threshold. However, the way that Nomad handles killing these tasks when OOM is to send a SIGKILL and not use the configured kill_signal for either of these configurations
resources {
memory = 1000
}
resources {
memory = 500
memory_max = 1000
}
Any suggestions on how I can get Nomad to kill a task gracefully once it approaches a memory threshold? Thanks
Hi @axsuul, the OOM handing is implemented by Linux cgroups.
In the outgoing cgroups v1, what you’re asking for may have been plausible through clever use of memory.oom_control and having Nomad register a per-task watching routine issuing your custom signal.
The closest I can think of off the top of my head would be to monitor memory.events.local["max"] and send a signal if that value changes, but that’s not the same as actually entering an OOM event.
Thanks for your reply! Doing it the cgroups way sounds like it could cause some conflicts and race conditions. Would it be better then to instead set a super high memory_max on the job and then monitor memory usage with a custom script instead?
Is there a good way to get memory usage metrics from Nomad itself? I have tried querying /metrics and /allocation/<alloc-id> endpoints but they don’t return that info. Or what way would you recommend to get memory usage if I’m going to be doing the sidecar method?