Detecting Resource Exhaustion / Placement Failure

mikeblum · September 27, 2019, 6:03pm

Is there a datadog or prometheus metric emitted by either the
Nomad server a/o client for detecting and alerting on when a deployment fails due to resource exhaustion?

The statsD metrics don’t really seem to capture a running job that is failing to deploy:

Datadog - Nomad

**nomad.client.allocations.blocked**
(gauge) Number of allocations blocked for a client
*Shown as job*
**nomad.client.allocations.pending**
(gauge) Number of allocations pending for a client
*Shown as job*
**nomad.client.allocations.running**
(gauge) Number of allocations running for a client
*Shown as job*
**nomad.client.allocations.terminal**
(gauge) Number of allocations terminated for a client
*Shown as job*

danlsgiga · November 4, 2019, 7:21pm

Same here! I have Datadog monitoring any pending jobs and for some reason all I can see is this:

Nomad is not reporting the total at all times… I had 2 jobs in pending mode for several hours during that period and I was expecting the count to be 2, not 0.

mikeblum · January 19, 2020, 10:04pm

We found that we want to track the pending status of the allocation not the running job. That has shown to be a pretty reliable indicator of job placement failure as the running job will fail if it has not healthy and running allocations after all the restart attempts or placement failures have been exhausted.

ketzacoatl · January 25, 2020, 1:44am

Can you explain a little more about how you are doing that?

Topic		Replies	Views
How can I notice a failed job from the metrics? Nomad	2	1000	November 15, 2022
Making sense of "failed to place allocation" logs Nomad	0	1495	October 8, 2021
Nomad periodic job metrics Nomad	2	993	November 27, 2020
Placement failures - how do I debug it? Nomad	2	4320	February 22, 2020
Every deployment ends with No allocations for job are running, third deployment prunes the job from nomad away Waypoint consul-nomad	6	1024	October 14, 2022

Detecting Resource Exhaustion / Placement Failure

Related topics