We are currently evaluating Nomad for our application scheduling needs. Of most interest is the “distributed cron” feature which we have implemented with periodic jobs. Our current setup consists of both Windows and Linux operating systems with a mixture of persistant services and what I would call “daily services”- these are jobs that we would restart on a daily basis with precise downtimes with timings that may vary per application. This is currently managed in crontab (or task scheduler on Windows) where there is a start job and a stop job (typically just a kill signal).
I am trying to work out how best to migrate this style of job to Nomad. I have been using sysbatch scheduler with the periodic stanza to replicate this behaviour. This works well for starting a job but I don’t have a clear solution for stopping the job. The potential solutions I have in mind are:
Create separate “start” and “stop” periodic jobs in Nomad with the stop jobs running a kill signal at a given time. The downside is that I will also need to make sure that the restart policy of the “start” job does not cause it to be restarted by Nomad
Use the kill_timeout parameter in the task stanza to have the task end at a given time - we are running these kind of tasks over the course of a day so this would be quite long and doesn’t seem to be the intended use of this feature. This would also require us to always convert a fixed end time to a duration which is a potential source of error.
Make changes to our applications to fit more of a “service” style without the need for daily restarts. This is feasible for some but not all of our stack as there are legacy elements
Run the jobs as a service but still have a daily (periodic) stop job and use the delay parameter in the restart stanza to emulate downtime
Ideally I am looking for something like an endtime parameter where the job terminates at a specific time however this doesn’t appear to exist - I fear I am either misunderstanding something or am missing some documentation.
Does anyone have any experience with similar kind of jobs or have any advice on best approach here?
We have found nothing outside of what has been presented already. Here are the possible solutions that I’ve brainstormed - none of them are that great as they don’t address things like holidays or unexpected maintenance.
A cronjob that runs and stops jobs configured as a nomad batch job. Possibly the data for stop timings can be in nomad meta tags to keep a single source of truth.
A external service which we need to build which keeps track of what should be running and when and uses API calls to stop / start.
Re-writing the apps to stop at a certain time.
Using the autoscaler cron plugin to write something that scales to 0 outside of the window.
Although I am not super far along in my Nomad journey I totally agree that this would be a great thing to have!
Hi @stanislav.kobylansky , I have settled on a similar solution. I have yet to deploy in a production environment however. My solution is:
Create a periodic job that starts the job on a schedule
Create a second periodic job that stops all “child” jobs for a given periodic job.
This seems to work fairly well in our test environment but adds the overhead of having to create a separate periodic job for stopping and also having to configure start and end times in different places.
I created a crude bash script to make it easier to stop a given periodic job as it requires you to find all running “child” jobs and stop them. It susceptible to changes in the output of the nomad client so its somewhat fragile long-term. It has so far seemed to work ok for us in our test environment.
#!/usr/bin/env bash
IFS=$'\n'
RUNNING_JOBS=( $(nomad job status ${1} | grep "periodic-.* running" | awk '{print $1}') )
if [ $? -eq 0 ]; then
echo OK
else
# curl -H - send alert
echo FAIL
exit 1
fi
echo ${RUNNING_JOBS[*]}
RET=0
for JOB in "${RUNNING_JOBS[@]}"
do
echo "${JOB}"
nomad job stop ${JOB}
if [ $? -eq 0 ]; then
TEXT="Nomad: Stopped ${JOB} (${1})"
# curl -H - send alert
echo OK
else
TEXT="Nomad: Failed to stop ${JOB} (${1})"
# curl -H - send alert
RET=1
echo FAIL
fi
done