Avoid rescheduling due to resource constraints with batch jobs

I’m creating batch jobs through the python API and trying to force the job to fail if there aren’t resources available when the job is first registered.

When I inspect the job I have stanzas that I think should prevent retries of any sort:

### No rescheduling
$ nomad job inspect b55ec362_3 | grep -A 7 ReschedulePolicy
                "ReschedulePolicy": {
                    "Attempts": 0,
                    "Delay": 5000000000,
                    "DelayFunction": "constant",
                    "Interval": 86400000000000,
                    "MaxDelay": 0,
                    "Unlimited": false
                },

### No restarting
$ nomad job inspect b55ec362_3 | grep -A 7 Restart
                "RestartPolicy": {
                    "Attempts": 0,
                    "Delay": 0,
                    "Interval": 5000000000,
                    "Mode": "fail"
                },
                "Scaling": null,
                "Services": null,
--
                        "RestartPolicy": {
                            "Attempts": 0,
                            "Delay": 0,
                            "Interval": 5000000000,
                            "Mode": "fail"
                        },
                        "ScalingPolicies": null,
                        "Services": null,

Yet when I look at the evals for the job there are two … one for registration and another for “queued-allocs” … which I assume is rescheduling of some sort … ?

$ nomad job status -evals b55ec362_3
ID            = b55ec362_3
Name          = b55ec362_3
Submit Date   = 2021-04-12T14:32:31Z
Type          = batch
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = dead
Periodic      = false
Parameterized = false

Summary
Task Group      Queued  Starting  Running  Failed  Complete  Lost
python_command  0       0         0        0       1         0

Evaluations
ID        Priority  Triggered By   Status    Placement Failures
902f2257  50        queued-allocs  complete  false
7774b66c  50        job-register   complete  true

Allocations
ID        Node ID   Task Group      Version  Desired  Status    Created    Modified
98e02392  4b543a2d  python_command  0        run      complete  2h58m ago  2h57m ago

My questions are:

  1. Is “queued-allocs” an indication of hitting a resource constraint?
  2. ReschedulePolicy the stanza that would prevent restarting due to resource constraints?
  3. Is there something wrong with the ReschedulePolicy that I’m using?

Hi @jaburi :wave:

You reschedule and restart seem to be correct. I think that what’s happening here is that the these configurations are only used when an evaluation results in a placement, meaning, your job actually runs.

Since the evaluation seems to have been blocked (that’s the queued-allocs), it didn’t result in a placement (notice the false under the Placement column).

Eventually, the evaluation was unblocked (for example, so other job finished and releases resources).

If you were to kill the job once it starts running the reschedule and restart configuration would be applied and there wouldn’t be a follow-up allocation.

Just to make sure I understood what you are trying to do, you would like for the job to not run at all if there are not enough resources available?

Hello @lgfa29,

Thanks for the response!

Ah, I see. The reason reschedule/restart aren’t doing what I want makes sense … thanks for pointing out that distinction between placed and blocked.

Just to make sure I understood what you are trying to do, you would like for the job to not run at all if there are not enough resources available?

Correct … if there aren’t enough resources for all of the jobs to run concurrently, I want to know that I’ve hit that limit and get a failure back. Am I trying to do something that’s not possible?

Got it.

Nomad will try its best to run your jobs. If there are not enough resources, it will keep the request as a blocked evaluation and periodically re-check your cluster to see if it’s possible to run the job now. So there’s nothing built-in that would do what you are looking for.

When you register a job the command output will give out the evaluation ID:

$ nomad run example.nomad
==> Monitoring evaluation "2692f487"
    Evaluation triggered by job "example"
==> Monitoring evaluation "2692f487"
    Evaluation within deployment: "8faa1490"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "2692f487" finished with status "complete" but failed to place all allocations:
    Task Group "cache" (failed to place 1 allocation):
      * Resources exhausted on 1 nodes
      * Dimension "memory" exhausted on 1 nodes
    Evaluation "2bb077fd" waiting for additional capacity to place remainder

$ nomad eval status 2bb077fd
ID                 = 2bb077fd
Create Time        = 16s ago
Modify Time        = 16s ago
Status             = blocked
Status Description = created to place remaining allocations
Type               = service
TriggeredBy        = queued-allocs
Priority           = 50
Placement Failures = N/A - In Progress

What you could do is to create a script that runs a job, extracts its eval ID and checks if its status is blocked. If it’s, you can stop the job cancel it.

$ nomad stop example
==> Monitoring evaluation "207e4df4"
    Evaluation triggered by job "example"
==> Monitoring evaluation "207e4df4"
    Evaluation within deployment: "8faa1490"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "207e4df4" finished with status "complete"

If parsing outputs gets too complicated, you can use Nomad’s API. These are the endpoints you would need:

I hope these help :slightly_smiling_face: