I have a weird job in my nomad cluster that i am unable to get rid of permanently.
I run
nomad job stop --purge
and after a few seconds i follow it up with a
nomad system gc
But soon afterwards i see the job back in the UI and CLI again.
I have attempted to reduce the job_eval_threshhold
to 30s thinking maybe that would work, no dice! i am short of ideas … any help is appreciated.
My nomad cluster is installed with consul for service discovery.
Additional Information; this is a system job. When i attempt to Edit the job in the UI (for example, by simply just changing the docker image version); it deploys and after a few minutes reverts to the old version. I am currently running nomad 1.8.4 (clients and servers); consul 1.19.2.
Did the servers reboot unexpectedly or something?
Can you submit an altogether different job (very low cpu/ram) with the same name?
Then run system gc
then system reconcile summaries
then …
stop --purge
this new job …
then
system gc
then system reconcile summaries
.
Can you afford to update the Nomad server binaries and cleanly reboot the servers?
I deployed a new system job with the same name but it failed and reverted to the old version.
I will try the binary update.