Job constantly restarted by SIGTERM and no clue why

I run a Tor relay as a Job in my Nomad cluster. The node itself has plenty of RAM and CPU and I don’t see an obvious reason why the jobs needs to be terminated regulary.

Sometimes it runs for a week without any issue but sometimes it restarts multiple times per day. In the logs of Tor I can see
Catching signal TERM, exiting cleanly.

In the Nomad Job events overview I only see
Terminated Exit Code: 0

I can’t find any errors whatsoever and wonder what could be the reason for this behaviour?
Any ideas?

You could try to log the allocation attached to the job.

nomad status <job-name>

Scroll to the bottom and you will find an allocation ID (short hash). Copy the ID

nomad alloc status <Alloc-ID>
nomad alloc logs <Alloc-ID>

If the job has more than one task you will be asked to write down the task too like:

nomad alloc logs <Alloc-ID> <Task-name>

You can find more info related to this here Commands: alloc logs | Nomad | HashiCorp Developer

My guess is that you will see the logs there and the job is exiting by an error in the container (If you’re using docker as a driver). Let us know if this solved the issue!

That’s basically what I did to get the log of the tor process. Which only shows Catching signal TERM, exiting cleanly. without an error.
Since Nomad is managing this job I thought Nomad would send the SIGTERM. But I have no idea why. That was my original question

Hmmmmm After doing some research on Google it seems an issue related to tor, not with nomad itself.

I would try to increase the verbosity of the logs in tor service (if this is possible) so next time it happens you will have more info.

Have a look at this too

Uh ok… I actually haven’t googled it tbh :smiley:
It looked quite clean and seemed to me that Nomad is somewhy doing this.