Nomad task restarting but never restarts

SunSparc · May 13, 2022, 5:29pm

I just spun up a new Nomad cluster. I have several tasks running. The latest deployment for all of them is marked as “successful”. However, I checked back and found that a couple of them been moved into an allocation status of “pending”. This would usually be fine, except that they have been pending for 14 hours. I would not call that pending, I would call that hanging or frozen.

Here are the recent events:

|May 12, '22 20:50:01 | Restarting           | Task restarting in 0s
|May 12, '22 20:50:01 | Terminated           | Exit Code: 0
|May 12, '22 20:50:01 | Restart Signaled     | Template with change_mode restart re-rendered
|May 12, '22 17:17:53 | Started              | Task started by client
|May 12, '22 17:17:49 | Downloading Artifacts| Client is downloading artifacts
|May 12, '22 17:17:49 | Task Setup           | Building Task Directory
|May 12, '22 17:17:49 | Received             | Task received by client

I understand these events to mean that one of the rendered templates was changed, which triggered a restart. I see that restart is the default for change_mode on templates. However, none of the templates that I defined in this particular job had any specific change_mode or change_signal, which means the default is being used.

This does not explain why the task is not actually restarting though.

If I check the status on the command line, I see:

$ nomad eval status -verbose 3cae05d2
No evaluation(s) with prefix or id "3cae05d2" found

I open the client monitor. I change the level to “Trace”. I watch it for a few minutes and see nothing that relates to the task in question.

What would be my next step in troubleshooting why this task is not being restarted properly?

SunSparc · May 13, 2022, 6:03pm

Looks like someone else had a similar problem: Nomad task pending for few minutes

SunSparc · May 18, 2022, 11:20pm

I think I figured it out. I remembered having run into a similar situation before.

On the job that was having the problem we were specifying constraint { distinct_hosts = true }, while at the same time also constraining it to a single node. This caused a deadlock that would not allow it to restart in place. I wonder if Nomad could indicate somehow when a deadlock occurs due to a constraint. Either way, I just need to be better about evaluating the constraints that I specify.

SunSparc · May 19, 2022, 10:24pm

Well, I hoped that was the problem. However, I checked on the jobs today, and once again they were stuck in Restarting mode, forever and ever. Like marble statues chiseled to perfection.

Topic		Replies	Views
Nomad task constantly restarts due to uncontrolled template rerender Nomad consul-template	4	1596	August 8, 2022
Task not restarted on changed template from host system (Env, Docker, disabled file sandbox) Nomad	0	207	May 15, 2023
Don't restart Jobs when Vault Task Token TTL expires Nomad	5	466	May 30, 2022
Updating task template without triggering a restart Nomad	4	885	February 8, 2022
Why multiple dead and system jobs restart when restarting a Nomad client? Nomad	1	266	September 25, 2023

Nomad task restarting but never restarts

Related topics