How to stop a specific allocation, without re-schedule

I run a few jobs with 3 (or more) allocations, each having a CSI volume (with per_alloc=true). For example : postgres, mongodb, elasticsearch etc.

I sometime have to debug, or fixe something in the volume (eg, I need to run an fsck, or do some manual maintenance in the volume content). But I’d like to do so one allocation at a time, so the sevrice stays up.

If I terminate an allocation, it’s immediatly re-scheduled.

For the last allocation it’s easy, I can just scale the group down. But what about the others ? Is there a way to stop allocation with index 0 without rescheduling it, so I can access it’s CSI volume without bringing the whole job down ?

If the allocation is client bound, a dirty trick could be to make the client (temporarily) ineligible, preventing it from getting new allocations.
Once maintenance is done, make it eligible again and the allocation should be auto-magically scheduled.

big downside of course, it won’t be able to place allocation from any other job either… :confused:

If the current node is ineligible, a new allocation will be scheduled on another node. So this would require to put all nodes in ineligible state

I’ve opened a feature request for this

1 Like