ACLs and the Garbage Collector

I’ve been taking Nomad 1.0 beta for a spin, and I’m pretty glad namespaces have made it to the open source version. All of my users run batch jobs, and sometimes jobs fail when a user is iterating a lot. The main issue is that rerunning a batch job with the same name as a dead one results in the new job not running.

Previously, I have told my users about the curl command to invoke the garbage collector. With ACLs in place, this seems to no longer be an option, since they would need a management token. What I’d like to know is as follows.

  • Is there any way that I can allow users to run the garbage collector on a namespace they have write access to without handing them a management token?
  • Is there a way to force a batch job to be run again after modifying a job file without having to invoke the garbage collector?
  • Since “service” jobs run when one is dead with the same name, is there a way for a “service” job to exit gracefully like a batch job?

The only alternative I see is to have the garbage collector run very frequently, and I don’t like this option due to users potentially not seeing logs in time on a failing job.

Also, I notice running

nomad system gc -namespace="bob"

deletes all of the dead jobs, instead of just those in the specified namespace. Is that going to get worked out in the full release of 1.0?

Would your users be able to use nomad job stop -purge «jobname» instead of using the GC endpoint to accomplish a similar end?

I’m fairly certain that garbage collection is namespace agnostic, so I don’t believe the API honors the namespace that is submitted. I’ll give that a look, but I wanted to see if the -purge would work for your needs.

Hopefully this unblocks you.

Regards,
Charlie Voiselle
Product Education Engineer, Nomad

1 Like