How to start lost allocations on a specific node without restarting entire cluster?

Hi nomad community,

We have nomad cluster of 7 nodes, each node is supposed to have two allocations, for some reason, one node has only one allocation, the other allocation has three container services(apm, kibana and elastic search). How can I start those containers without restart the entire cluster? I was trying to run some nomad cli such as
nomad alloc restart or nomad alloc exec
but these commands only support running tasks
can we use nomad cli to obtain failed or lost containers and restart them?

Thank you!
Tony

Hi @ztony,

Could you provide the job specification which you’re using to ensure each node has two allocations, as this would impact what could be done to resolve the problem you have?

I would try using the nomad job eval command to force an evaluation of the job in question. If the scheduler deems there a need to replace the failed/lost allocation, then this will happen. Outside of this, assuming the cluster is communicating well between all servers and nodes, a failed or lost allocation is terminal and there is no way to restart it.

Thanks,
jrasell and the Nomad team

Hi Jrasell,

Thank you for your reply. we will probably rerun the job since nomad UI or nomad cli can not show the details of the failed or lost allocations.

Tony