Restart policy of *successful* batch jobs

Hi, not using Nomad yet but it sounds highly promising. Our application consists of many individual batch jobs that should be run continuously. The manual deployment system, prior to Nomad, is to start a tmux session on a server and run while :; do ./batch-job.sh ; sleep 5m ; done in bash. Regular cronjobs are possible, although we only want to run 1 instance of a job at a time. Docker-compose implements this with the restart: always directive.

Can we use Nomad in this “infinite-loop of short-lived batch jobs” fashion using the reschedule stanza? Docs only mention “task failure”, not just task completion.

Hi, in my opinion the comparison is not equivalent (and I think there is a way to achieve what you want).

  1. in Nomad cron (i.e. periodic) has an option to avoid overlap, so possible that can help ?!

  2. In docker-compose, I think (I could be wrong) the restart: always restarts a a failed container, i.e. the expectation is that the container is of type service (in Nomad speak), i.e. if external forces don’t cause the container to exit, it would run forever.

But, I think you don’t multiple invocations (i.e. allocations) due to the periodic as the “name” of each allocation is different, and wrapper code would be needed to track the “current allocation”.

That said, could you try to make the job of type service and set the restart and reschedule stanzas appropriately to get the effectively same result?

The default strategy is to back off, which can sometimes give the impression that the service job is not restarting.

3 Likes

Periodic batch jobs with prohibit_overlap = true should work for our use-case. The timing is not the exact same as with the docker-compose or the bash loop solution, but I’m not fussed about it, our batch jobs aren’t time sensitive

Thanks for the quick response!

1 Like