Is there a way to specify the kill_signal used when task is OOM (Out of Memory)?

I have background job tasks that need to exit cleanly if they need to stop. These tasks can sometimes grow in memory and should be restarted once they cross a threshold. However, the way that Nomad handles killing these tasks when OOM is to send a SIGKILL and not use the configured kill_signal for either of these configurations

resources {
  memory = 1000
}
resources {
  memory = 500
  memory_max = 1000
}

Any suggestions on how I can get Nomad to kill a task gracefully once it approaches a memory threshold? Thanks

Hi @axsuul, the OOM handing is implemented by Linux cgroups.

In the outgoing cgroups v1, what you’re asking for may have been plausible through clever use of memory.oom_control and having Nomad register a per-task watching routine issuing your custom signal.

In the new cgroups v2 world, I don’t think there is an equivalent functionality. You can read about the tools we have to work with in Control Group v2 — The Linux Kernel documentation

The closest I can think of off the top of my head would be to monitor memory.events.local["max"] and send a signal if that value changes, but that’s not the same as actually entering an OOM event.

Thanks for your reply! Doing it the cgroups way sounds like it could cause some conflicts and race conditions. Would it be better then to instead set a super high memory_max on the job and then monitor memory usage with a custom script instead?

If you own the source of the app, I’d probably try to implement some kind of in-process watcher, e.g.

via runtime package - runtime - Go Packages in Go
or Runtime (Java Platform SE 7 ) in Java,
etc.

But failing that a sidecar that monitors memory usage would probably work too.

1 Like

Thanks.

Is there a good way to get memory usage metrics from Nomad itself? I have tried querying /metrics and /allocation/<alloc-id> endpoints but they don’t return that info. Or what way would you recommend to get memory usage if I’m going to be doing the sidecar method?