The app I’m developing has an administrative UI used to launch applications. The applications created require a number steps for install, configuration, etc, which are batch jobs. Before having a service job for the final web server and backend application.
The problem I’m facing is that the batch jobs are passed to nomad but never start because of resource constraints. Instead of just failing and causing the UI to report the failure, I’m waiting for the UI to pick up the failure using a timeout.
For my use case, if the cluster has no more resource, then its unlikely that will change as resource usage does not flucuate all that much in my use case.
Therefore, it makes little sense to wait 5 minutes before timing out the job, rather than just regarding the job failed immediately because it never started.
Unless, can I detect jobs have blocked due to resource constraint through the API?