Nomad task restarting but never restarts

I just spun up a new Nomad cluster. I have several tasks running. The latest deployment for all of them is marked as “successful”. However, I checked back and found that a couple of them been moved into an allocation status of “pending”. This would usually be fine, except that they have been pending for 14 hours. I would not call that pending, I would call that hanging or frozen.

Here are the recent events:

|May 12, '22 20:50:01 | Restarting           | Task restarting in 0s
|May 12, '22 20:50:01 | Terminated           | Exit Code: 0
|May 12, '22 20:50:01 | Restart Signaled     | Template with change_mode restart re-rendered
|May 12, '22 17:17:53 | Started              | Task started by client
|May 12, '22 17:17:49 | Downloading Artifacts| Client is downloading artifacts
|May 12, '22 17:17:49 | Task Setup           | Building Task Directory
|May 12, '22 17:17:49 | Received             | Task received by client

I understand these events to mean that one of the rendered templates was changed, which triggered a restart. I see that restart is the default for change_mode on templates. However, none of the templates that I defined in this particular job had any specific change_mode or change_signal, which means the default is being used.

This does not explain why the task is not actually restarting though.

If I check the status on the command line, I see:

$ nomad eval status -verbose 3cae05d2
No evaluation(s) with prefix or id "3cae05d2" found

I open the client monitor. I change the level to “Trace”. I watch it for a few minutes and see nothing that relates to the task in question.

What would be my next step in troubleshooting why this task is not being restarted properly?

Looks like someone else had a similar problem: Nomad task pending for few minutes

I think I figured it out. I remembered having run into a similar situation before.

On the job that was having the problem we were specifying constraint { distinct_hosts = true }, while at the same time also constraining it to a single node. This caused a deadlock that would not allow it to restart in place. I wonder if Nomad could indicate somehow when a deadlock occurs due to a constraint. Either way, I just need to be better about evaluating the constraints that I specify.

Well, I hoped that was the problem. However, I checked on the jobs today, and once again they were stuck in Restarting mode, forever and ever. Like marble statues chiseled to perfection.