I’m creating batch jobs through the python API and trying to force the job to fail if there aren’t resources available when the job is first registered.
When I inspect the job I have stanzas that I think should prevent retries of any sort:
### No rescheduling
$ nomad job inspect b55ec362_3 | grep -A 7 ReschedulePolicy
"ReschedulePolicy": {
"Attempts": 0,
"Delay": 5000000000,
"DelayFunction": "constant",
"Interval": 86400000000000,
"MaxDelay": 0,
"Unlimited": false
},
### No restarting
$ nomad job inspect b55ec362_3 | grep -A 7 Restart
"RestartPolicy": {
"Attempts": 0,
"Delay": 0,
"Interval": 5000000000,
"Mode": "fail"
},
"Scaling": null,
"Services": null,
--
"RestartPolicy": {
"Attempts": 0,
"Delay": 0,
"Interval": 5000000000,
"Mode": "fail"
},
"ScalingPolicies": null,
"Services": null,
Yet when I look at the evals for the job there are two … one for registration and another for “queued-allocs” … which I assume is rescheduling of some sort … ?
$ nomad job status -evals b55ec362_3
ID = b55ec362_3
Name = b55ec362_3
Submit Date = 2021-04-12T14:32:31Z
Type = batch
Priority = 50
Datacenters = dc1
Namespace = default
Status = dead
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
python_command 0 0 0 0 1 0
Evaluations
ID Priority Triggered By Status Placement Failures
902f2257 50 queued-allocs complete false
7774b66c 50 job-register complete true
Allocations
ID Node ID Task Group Version Desired Status Created Modified
98e02392 4b543a2d python_command 0 run complete 2h58m ago 2h57m ago
My questions are:
- Is “queued-allocs” an indication of hitting a resource constraint?
- ReschedulePolicy the stanza that would prevent restarting due to resource constraints?
- Is there something wrong with the ReschedulePolicy that I’m using?