How to not killed all task of an allocation when one task is failing

scyd · May 6, 2021, 9:09pm

In our use case, we do not want to auto restart failing tasks on failure, so we set

restart {
  attempts = 0
  delay    = "15s"
  interval = "24h"
  mode     = "fail"
}

However, when a task is failing it seems that nomad is killing all the running tasks belonging to the same allocation.

Is there a way to prevent that ?

Thanks.

jrasell · May 10, 2021, 9:04am

Hi @scyd and thanks for asking this question. Nomad treats an allocation as an immutable object, therefore a single task within a group failing, as you state, results in the remaining tasks being marked as failed. There is no way around this behaviour. If the tasks within the group are as independent as you detail, it might be advisable to run some (or all) in separate groups, or even jobs.

Thanks,
jrasell and the Nomad team

Topic		Replies	Views
Can one Nomad task in task group take down other task if fails? Nomad	0	176	May 11, 2023
Restart a task with a sleep Nomad	1	365	May 26, 2022
Question: Allocation status for failed if restart/reschedule is disabled Nomad	0	254	February 8, 2021
How to start allocation which is ignored without restarting the whole job? Nomad	2	33	July 18, 2024
How to start lost allocations on a specific node without restarting entire cluster? Nomad	2	261	February 9, 2023

How to not killed all task of an allocation when one task is failing

Related topics