I run a Tor relay as a Job in my Nomad cluster. The node itself has plenty of RAM and CPU and I don’t see an obvious reason why the jobs needs to be terminated regulary.
Sometimes it runs for a week without any issue but sometimes it restarts multiple times per day. In the logs of Tor I can see
Catching signal TERM, exiting cleanly.
In the Nomad Job events overview I only see
Terminated Exit Code: 0
I can’t find any errors whatsoever and wonder what could be the reason for this behaviour?
You could try to log the allocation attached to the job.
nomad status <job-name>
Scroll to the bottom and you will find an allocation ID (short hash). Copy the ID
nomad alloc status <Alloc-ID>
nomad alloc logs <Alloc-ID>
If the job has more than one task you will be asked to write down the task too like:
nomad alloc logs <Alloc-ID> <Task-name>
You can find more info related to this here Commands: alloc logs | Nomad | HashiCorp Developer
My guess is that you will see the logs there and the job is exiting by an error in the container (If you’re using docker as a driver). Let us know if this solved the issue!
That’s basically what I did to get the log of the tor process. Which only shows
Catching signal TERM, exiting cleanly. without an error.
Since Nomad is managing this job I thought Nomad would send the SIGTERM. But I have no idea why. That was my original question
Hmmmmm After doing some research on Google it seems an issue related to tor, not with nomad itself.
I would try to increase the verbosity of the logs in tor service (if this is possible) so next time it happens you will have more info.
Have a look at this too
Uh ok… I actually haven’t googled it tbh
It looked quite clean and seemed to me that Nomad is somewhy doing this.