I’m trying to configure prometheus alerts for Nomad jobs. I’m fine with generic jobs, but in stuck with periodic jobs.
I would like to configure an alert when a periodic job is running longer than expected. For example, it could hang, so someone should be notified to check it.
I’ve seen something similar (https://github.com/sepulworld/deadman-check) , but maybe there is some way to configure it natively not using 3rd party tools?
It would be awesome to have a metric about job run duration, maybe it’s a good idea to open an enhancement proposal in https://github.com/hashicorp/nomad/issues ?