Nomad_client_allocs_oom_killed metric is missing

Hello, i have an allocation that ends on the nomad cluster with the following logs

Jan 22 09:01:17 ip-192-xx-xx-115 nomad[406]:     2024-01-22T09:01:17.202Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=50ab3dd9-d2a6-e96d-be53-4a5ed7be6c4a task=frontend type=Terminated msg="Exit Code: 137, Signal: 9" failed=false
Jan 22 09:01:17 ip-192-xx-xx-115 nomad[406]:  client.alloc_runner.task_runner: Task event: alloc_id=50ab3dd9-d2a6-e96d-be53-4a5ed7be6c4a task=frontend type=Terminated msg="Exit Code: 137, Signal: 9" failed=false

by checking the underlying logs i figure out that this come from an oom killed :

Jan 22 09:01:17 ip-192-xx-xx-115 kernel: Memory cgroup out of memory: Killed process 2577 (java) total-vm:8325660kB, anon-rss:4178744kB, file-rss:21016kB, shmem-rss:0kB, UID:65534 pgtables:9180kB oom_score_adj:0

but i don’t understand why i’m not able to see the metric nomad.client.allocs.oom_killed on the metrics endpoint.

Did somesone can help me to understand the all process ?

How can i prevent such situation ?

Hi @Antse,

I just took a quick look at the code, and it seems the Java task driver is not populating the OOM information in the responses to Nomad. This is why the metric is not showing up. I will see if I can raise a quick PR to fix this issue.

Thanks,
jrasell and the Nomad team

Possibly related to this issue: Get back reason of killing a task into log · Issue #5887 · hashicorp/nomad · GitHub