Job with CSI volumes doesn't deploy

Hi folks, looking for some guidance here. I’ve been playing a bit with CSI in Nomad, and it’s been… interesting. I’m experimenting on Hetzner Cloud, they have a CSI driver available, so far, so good.

Created a controller job, and the system job for the nodes. So far, so good, nomad plugin status reports all is well. Created a volume on hcloud, used the numerical ID to register the volume with nomad. nomad volume status shows it, and schedulable = true.

The volume is registered with id “mytest-1” (yes, super original). I create a job with the following volume definition in the “group” block:

volume "mytest-1" {
  type = "csi"
   source = "mytest-1"
   read_only = false

In the task, I have the following volume_mount block:

volume_mount {
  volume = "mytest-1"
  destination = "/my/test/volume"

When I nomad job plan the job file, all seems well - when I nomad job run the job file, the evaluation finished immediately, and gives no messages. No tasks are deployed.

Removing the volume {} and volume_mount {} blocks from the job lets it run, but I kinda sort a need those volumes :wink:

Any ideas why this is happening?

Hi @benvanstaveren! You’ll want to dig into the Nomad server and client logs to see if there are errors there. The plugins may have logs that you can look at with nomad alloc logs :alloc_id as well.

No need, turns out I’m an idiot - the job in question was a system job, constrained to a node class (a pattern we use a lot at work to sort of more easily get things spun up when they need to be) - this caused the evaluation to fail, apparently. As soon as I turned it into a service, it all worked but I had to add userns_mode=“host” to the actual task (coincidentally, mysql) as well, otherwise docker would bitch up a storm about permission denied for the mounts.

But that’s not a nomad issue :smiley:

On a tangent, would it be possible some day (or is it already) to use variable interpolation in the volume {} definition? It would be most excellent to do something like…

volume "volume-for-${}" { .... }

And have that work with system jobs? Mostly because we tend to run things like database clusters as system jobs, constrained to hosts of class ‘db’ (for example) so that if one conks out, we just spin up a new node, and ideally it’d get the system job sorted, except currently there’s the whole ‘recover backup first’ bit that needs to happen. If we can get volumes to work like described above with system jobs, things would “Just Work™” - which in my dev-opsing mind is absolute nirvana :smiley:

On a tangent, would it be possible some day (or is it already) to use variable interpolation in the volume {} definition?

There’s probably a few things that we can’t interpolate because of order-of-operations, but I don’t see why we couldn’t interpolate things like node metadata like your example. Would you be willing to open a GitHub issue for that? Having a user with a specific use-case helps me sell prioritizing the work :grinning:

Submitted as #7877 :slight_smile: