Timeout job placement failures

I am running in to scenarios whilst developing an integration with Nomad where a job cannot be placed due to a lack of resource (memory). Nomad cannot immediately place the job within the cluster but does not immediately fail the job and allows it to wait until resource becomes available.

Instead I’d like to have Nomad regard the placement as a failure after a timeout, which could by regarded as immediately. Is this possible?

Hi @spaulg,

I do not believe this is currently possible. Nomad will put the job into a blocked state with the hope that it will eventually be unblocked due to cluster scaling, preemption or other work finishing.

I’d be curious to understand the use case you have to require the job to fail, rather than become blocked?

Thanks,
jrasell and the Nomad team

Hi @jrasell

The app I’m developing has an administrative UI used to launch applications. The applications created require a number steps for install, configuration, etc, which are batch jobs. Before having a service job for the final web server and backend application.

The problem I’m facing is that the batch jobs are passed to nomad but never start because of resource constraints. Instead of just failing and causing the UI to report the failure, I’m waiting for the UI to pick up the failure using a timeout.

For my use case, if the cluster has no more resource, then its unlikely that will change as resource usage does not flucuate all that much in my use case.

Therefore, it makes little sense to wait 5 minutes before timing out the job, rather than just regarding the job failed immediately because it never started.

Unless, can I detect jobs have blocked due to resource constraint through the API?

Thanks
spaulg