Job constantly restarted by SIGTERM and no clue why

soupdiver · April 25, 2023, 8:25am

I run a Tor relay as a Job in my Nomad cluster. The node itself has plenty of RAM and CPU and I don’t see an obvious reason why the jobs needs to be terminated regulary.

Sometimes it runs for a week without any issue but sometimes it restarts multiple times per day. In the logs of Tor I can see
Catching signal TERM, exiting cleanly.

In the Nomad Job events overview I only see
Terminated Exit Code: 0

I can’t find any errors whatsoever and wonder what could be the reason for this behaviour?
Any ideas?

hector.medina.cabane · April 25, 2023, 10:19am

You could try to log the allocation attached to the job.

nomad status <job-name>

Scroll to the bottom and you will find an allocation ID (short hash). Copy the ID

nomad alloc status <Alloc-ID>

nomad alloc logs <Alloc-ID>

If the job has more than one task you will be asked to write down the task too like:

nomad alloc logs <Alloc-ID> <Task-name>

You can find more info related to this here Commands: alloc logs | Nomad | HashiCorp Developer

My guess is that you will see the logs there and the job is exiting by an error in the container (If you’re using docker as a driver). Let us know if this solved the issue!

soupdiver · April 25, 2023, 10:34am

That’s basically what I did to get the log of the tor process. Which only shows Catching signal TERM, exiting cleanly. without an error.
Since Nomad is managing this job I thought Nomad would send the SIGTERM. But I have no idea why. That was my original question

hector.medina.cabane · April 25, 2023, 11:46am

Hmmmmm After doing some research on Google it seems an issue related to tor, not with nomad itself.

I would try to increase the verbosity of the logs in tor service (if this is possible) so next time it happens you will have more info.

Have a look at this too

Tor Browser crashing on startup · Issue #148745 · NixOS/nixpkgs · GitHub

soupdiver · April 25, 2023, 3:43pm

Uh ok… I actually haven’t googled it tbh
It looked quite clean and seemed to me that Nomad is somewhy doing this.

Topic		Replies	Views
Nomad Alloc not stopping forcefully Nomad	13	899	April 21, 2023
Disconnect nomad task from network at sigkill, not sigterm Nomad	2	72	July 1, 2024
Understanding job restart behaviour on lost jobs Nomad	2	1070	May 12, 2022
Control Nomad job restart due to Vault key update Nomad	5	415	February 14, 2023
Stopping the Nomad Jobs gracefully Nomad	12	1761	December 21, 2022

Job constantly restarted by SIGTERM and no clue why

Related topics