We’ve been using Nomad on some Hetzner Cloud instances for a while now and recently started dabbling with CSI to get cloud volumes mounted, however, often when restarting a job (due to changes), we’ll see things like this:
CSI volume redacted-volume has exhausted its available writer claims and is claimed by a garbage collected allocation redacted-allocation-id; waiting for claim to be released
And then it just sits there spinning it’s little wheels. The question being that if the nomad server knows the allocation that used to claim the volume has been garbage collected, why can’t it just release the claim and move on with it instead of sitting there waiting for, well, seems forever at the moment.
There also does not seem to be a way to force-release a volume; nomad volume detach doesn’t work, and complains about “unknown alloc id” (which would make sense because the alloc does not show in the list of allocations, on account of it being gc’d already).
The workaround I use now is to force-deregister the volume, re-register it, and then things are fine but I can’t keep doing that, so any ideas from anyone on how to solve this issue?