Failed tasks not showing up

ViViDboarder · August 4, 2023, 7:36pm

Every day or two I’m getting a notice from one of my monitoring tools that a system job I host on my 3 node cluster isn’t up. When I check Nomad, it says the Job is “running”, but there are no groups or allocations. The previous allocations are also not shown, so I can’t see if they were “failed” or “completed” or tell what may have caused them to go away.

I suspect I know why they are failing in the first place, I’m having an issue with the Nomad service registry forgetting that something is running until I restart the alloc (there’s a GitHub issue for this), but I’m unable to confirm because the tasks are just fine.

Is there some setting that causes these to be reaped at some interval?

Running on Nomad 1.6.1.

Topic		Replies	Views
Nomad system jobs end up losing all allocations for no apparent reason, and not restarting them Nomad	2	545	February 21, 2024
Making sense of "failed to place allocation" logs Nomad	0	1486	October 8, 2021
Job with 3 native tasks fail on allocation, cannot get logs to troubleshoot Nomad	1	491	October 23, 2020
Nomad job troubleshooting Nomad	7	4353	December 19, 2024
How Nomad knows internally about the allocation status of job (Running, Failed, Queued, Starting, Complete, Lost)? Nomad	0	324	April 29, 2022

Failed tasks not showing up

Related topics