I just spun up a new Nomad cluster. I have several tasks running. The latest deployment for all of them is marked as “successful”. However, I checked back and found that a couple of them been moved into an allocation status of “pending”. This would usually be fine, except that they have been pending for 14 hours. I would not call that pending, I would call that hanging or frozen.
Here are the recent events:
|May 12, '22 20:50:01 | Restarting | Task restarting in 0s
|May 12, '22 20:50:01 | Terminated | Exit Code: 0
|May 12, '22 20:50:01 | Restart Signaled | Template with change_mode restart re-rendered
|May 12, '22 17:17:53 | Started | Task started by client
|May 12, '22 17:17:49 | Downloading Artifacts| Client is downloading artifacts
|May 12, '22 17:17:49 | Task Setup | Building Task Directory
|May 12, '22 17:17:49 | Received | Task received by client
I understand these events to mean that one of the rendered templates was changed, which triggered a restart. I see that restart
is the default for change_mode
on templates. However, none of the templates that I defined in this particular job had any specific change_mode
or change_signal
, which means the default is being used.
This does not explain why the task is not actually restarting though.
If I check the status on the command line, I see:
$ nomad eval status -verbose 3cae05d2
No evaluation(s) with prefix or id "3cae05d2" found
I open the client monitor. I change the level to “Trace”. I watch it for a few minutes and see nothing that relates to the task in question.
What would be my next step in troubleshooting why this task is not being restarted properly?