Migrating daily jobs to Nomad

Hi everyone,

We are currently evaluating Nomad for our application scheduling needs. Of most interest is the “distributed cron” feature which we have implemented with periodic jobs. Our current setup consists of both Windows and Linux operating systems with a mixture of persistant services and what I would call “daily services”- these are jobs that we would restart on a daily basis with precise downtimes with timings that may vary per application. This is currently managed in crontab (or task scheduler on Windows) where there is a start job and a stop job (typically just a kill signal).

I am trying to work out how best to migrate this style of job to Nomad. I have been using sysbatch scheduler with the periodic stanza to replicate this behaviour. This works well for starting a job but I don’t have a clear solution for stopping the job. The potential solutions I have in mind are:

  1. Create separate “start” and “stop” periodic jobs in Nomad with the stop jobs running a kill signal at a given time. The downside is that I will also need to make sure that the restart policy of the “start” job does not cause it to be restarted by Nomad

  2. Use the kill_timeout parameter in the task stanza to have the task end at a given time - we are running these kind of tasks over the course of a day so this would be quite long and doesn’t seem to be the intended use of this feature. This would also require us to always convert a fixed end time to a duration which is a potential source of error.

  3. Make changes to our applications to fit more of a “service” style without the need for daily restarts. This is feasible for some but not all of our stack as there are legacy elements

  4. Run the jobs as a service but still have a daily (periodic) stop job and use the delay parameter in the restart stanza to emulate downtime

Ideally I am looking for something like an endtime parameter where the job terminates at a specific time however this doesn’t appear to exist - I fear I am either misunderstanding something or am missing some documentation.

Does anyone have any experience with similar kind of jobs or have any advice on best approach here?

Thanks!

2 Likes

For anyone else that stumbles across this I found the following open issue that covers this use case: Scheduled *Service* jobs · Issue #2395 · hashicorp/nomad · GitHub

This is a pretty big sticking point for us to be able to fully adopt Nomad.

Has anyone had any luck with workarounds or have any thoughts on how best to manage this?

2 Likes

We have found nothing outside of what has been presented already. Here are the possible solutions that I’ve brainstormed - none of them are that great as they don’t address things like holidays or unexpected maintenance.

  1. A cronjob that runs and stops jobs configured as a nomad batch job. Possibly the data for stop timings can be in nomad meta tags to keep a single source of truth.
  2. A external service which we need to build which keeps track of what should be running and when and uses API calls to stop / start.
  3. Re-writing the apps to stop at a certain time.
  4. Using the autoscaler cron plugin to write something that scales to 0 outside of the window.

Although I am not super far along in my Nomad journey I totally agree that this would be a great thing to have!

1 Like

Have the same problem, we need to restart processes daily/weekly. All in all what is required:

  1. Start job as soon as it is created.
  2. Stop it on specific time.
  3. Start it on specific time.

I have to admit, it is odd that such functionality is missing.

What I ended up doing creating these 3 jobs:

  1. Batch job which runs cli to start 3.
  2. Batch job which runs cli to stop 3.
  3. Actual process job.

Looks a bit messy, but can’t think of anything better.

2 Likes

Hi @stanislav.kobylansky , I have settled on a similar solution. I have yet to deploy in a production environment however. My solution is:

  1. Create a periodic job that starts the job on a schedule
  2. Create a second periodic job that stops all “child” jobs for a given periodic job.

This seems to work fairly well in our test environment but adds the overhead of having to create a separate periodic job for stopping and also having to configure start and end times in different places.

I created a crude bash script to make it easier to stop a given periodic job as it requires you to find all running “child” jobs and stop them. It susceptible to changes in the output of the nomad client so its somewhat fragile long-term. It has so far seemed to work ok for us in our test environment.

#!/usr/bin/env bash

IFS=$'\n'

RUNNING_JOBS=( $(nomad job status ${1} | grep "periodic-.* running" | awk '{print $1}') )

if [ $? -eq 0 ]; then
	echo OK
else
	# curl -H - send alert
	echo FAIL
	exit 1
fi

echo ${RUNNING_JOBS[*]}
RET=0
for JOB in "${RUNNING_JOBS[@]}"
do
	echo "${JOB}"
	nomad job stop ${JOB}
	if [ $? -eq 0 ]; then
		TEXT="Nomad: Stopped ${JOB} (${1})"
		# curl -H - send alert
   		echo OK
	else
		TEXT="Nomad: Failed to stop ${JOB} (${1})"
		# curl -H - send alert
		RET=1
   		echo FAIL
	fi
done