So. I have a set of web frontends that run nginx, and we keep a bunch of static assets on an NFS server. Now before we just mounted that to the hosts in question and used host volumes. This works fine, but we wanted to change this to using CSI because it makes devops life easier.
The CSI NFS driver works, and the volume is registered as multi-node-multi-writer. On the nginx task, we declare the volume read-only (which it should be), yet, whenever the job is started no allocations are placed and the evaluation and deployment finish successfully.
As soon as you turn it into a service job, it works just fine. This is a bit problematic because we deploy things as a system job constrained to a node class so that any time a new node of that type appears, things Just Work™. Right now we’ve worked around it by running multiple groups with count=1 constrained to a single node.
I’m wondering if this is intentional behaviour, given that the CSI volume in question can be mounted in multiple tasks, I don’t see why it wouldn’t work. Any ideas on this folks? Is it intentional or am I hitting a weird edge case?
(And if it is intentional, feature request: make it so that multi-node volumes work with system jobs :D)