I’ve been taking Nomad 1.0 beta for a spin, and I’m pretty glad namespaces have made it to the open source version. All of my users run batch jobs, and sometimes jobs fail when a user is iterating a lot. The main issue is that rerunning a batch job with the same name as a dead one results in the new job not running.
Previously, I have told my users about the curl command to invoke the garbage collector. With ACLs in place, this seems to no longer be an option, since they would need a management token. What I’d like to know is as follows.
- Is there any way that I can allow users to run the garbage collector on a namespace they have write access to without handing them a management token?
- Is there a way to force a batch job to be run again after modifying a job file without having to invoke the garbage collector?
- Since “service” jobs run when one is dead with the same name, is there a way for a “service” job to exit gracefully like a batch job?
The only alternative I see is to have the garbage collector run very frequently, and I don’t like this option due to users potentially not seeing logs in time on a failing job.