Inconsistent Task Restart Behavior in a Custom Nomad Plugin

I’ve developed a custom Nomad plugin and encountered unexpected behavior:

  • In my plugin’s RunTask function, I start a goroutine. The design expectation is that if this goroutine exits with a non-zero exit code, Nomad should restart the task.
  • However, in some cases, Nomad calls DestroyTask before attempting a restart. When this happens, the task is not restarted; task logs show the exiting, restarting, and started state, but the task is never actually restarted. Instead, it is merely cleaned from the state, leaving it in an inconsistent state. Now, even if I try to stop this task via the UI, it remains stuck with the state “waiting 30 sec before killing.”
  • Interestingly, this behavior is not consistent: in other scenarios, even though DestroyTask is invoked, the task does restart properly.
  • I also haven’t observed any calls to StopTask in these cases, which adds to my confusion regarding the lifecycle. I cannot pinpoint exactly when Nomad decides to call DestroyTask over other task lifecycle methods.

Has anyone experienced similar behavior with custom plugins? Specifically:

When does Nomad choose to call DestroyTask rather than StopTask?
What might cause the task to become “stuck” after DestroyTask is called in certain scenarios?
Are there any recommended approaches or workarounds to ensure that tasks are restarted correctly after an unexpected goroutine exit?

Any insights or pointers to related documentation would be greatly appreciated.