Nomad ghost job

ibayer · April 30, 2020, 9:37am

Hi,

I’m running a nomad 0.11.0 and consul 1.7.2 cluster. I have issues with an old job that I can’t completely remove.

nomad job doesn’t show up in nomad ui
associated services are visible in consul ui
services removed with consul services deregister -id=.. are restarted
docker container associated with old nomad job is automatically restarted if manually stopped.
If I issue the nomad job again, the docker part of the job is shown as failing in nomad ui.

I would greatly appreciate help with troubleshooting this strange situation.

pxsloot · June 10, 2020, 9:35am

this happened to me after rebooting the whole cluster at once: containers are scheduled again by nomad after killing the containers, but no job or status can be found.
My work around/fix is deep cleaning the nomad nodes where the containers appear and reattach it to the cluster:

drain node
(nothing should be running here, except for the rogue job)
stop nomad and docker
empty the nomad and docker working dirs (often /var/lib/nomad and /var/lib/docker)
trigger the garbage collector (on cluster server: curl -XPUT http://127.0.0.1:4646/v1/system/gc)
reboot the node, add it to the cluster again if it doesn’t automatically

I had to clean up 2 out of 4) nodes this way before the ghost jobs were exorcised.

Topic		Replies	Views
Cannot delete a job in my nomad cluster Nomad consul-nomad	6	78	May 3, 2025
Services not deregister after Nomad stop job Nomad connect	5	1450	March 13, 2023
Nomad and accidental deregister jobs Nomad	5	2110	January 7, 2020
Nomad not deregistering services from consul after they moved node Nomad consul	1	1103	March 8, 2022
Dead nomad job not purged by GC (Garbage Collection) Nomad jobs	3	2771	June 4, 2022

Nomad ghost job

Related topics