I’ve seen a lots of situations where a job has some failed task (my last example : an app tried to acquire a liquibase lock on a DB table, but the lock was held by another app, so the task failed). Now, re-submitting the job with no modification does nothing (this is expected). But the command
nomad job eval -force-reschedule my-job
Also does nothing. This is not expected. From the doc, this should force reschedule of failed allocation. I haven’t defined any reschedule in my job file, which should mean it’s using the default for services, according to reschedule Block - Job Specification | Nomad | HashiCorp Developer
Now, the only ways I found to really force reschedule of failed alloc is to either stop -purge the job and re-submit it, or scale the affected task group with
nomad job scale my-job my-group <number>
Am I the only one having problem with this ?