Nomad job run 403 when using CSI volumes (even with csi-mount-volume capability)

We recently made our first attempt at using a CSI volume in Nomad. We updated our simple CI/CD policy for Gitlab deployments like so:

namespace "default" {
  policy = "read"
  capabilities = ["read-job","submit-job","csi-mount-volume"]
}

This caused the job to get a 403 when attempting a nomad job run. Changing the policy to “write” was the only way to get the job to deploy.

I’ve created a very simple demo job that highlights the issue:

job "csi-test" {
  datacenters = ["test"]

  group "main" {
    volume "csi-vol" {
      type            = "csi"
      source          = "test"
      read_only       = true
      attachment_mode = "file-system"
      access_mode     = "multi-node-reader-only"
    }

    task "csi-test" {
      driver = "docker"
      config { image = "hello-world" }

      volume_mount {
        volume = "csi-vol"
        destination = "/mnt/demo"
      }
    }
  }
}

I’ve tested with all combinations of the listed csi capabilities to no avail. Only ‘policy = “write”’ allowed the job to deploy.

I did further debugging. The problem is not policy = “write” (i was mistakenly using our management token to test). The problem appears to be that the token also needs read access to the plugin system to function properly. This policy set allowed the job to deploy successfully:

namespace "default" {
  capabilities = ["read-job","submit-job","csi-mount-volume"]
}
plugin {
  policy = "read"
}
1 Like

Hi @jhitt25 :wave:

I’m glad you were able to find the answer, and thank you for sharing it with other :slightly_smiling_face:

I’m sorry you had to go over and debug this yourself though. It seems like we’re missing this key piece of information in our documentation.

Do you have any thoughts where would be a good place to have this?

Maybe in the ACL paragraph of the nomad job run command, and in the ACL table of the POST /v1/jobs endpoit?

Were these places where you looked for this information?

I did indeed look in both of those places. A mention in both places would make sense, but the ACL table is definitely the most concise location and it’s my “go to” for this information. I would also recommend adding the optional csi and host volume ACL information to the apu table as well!

It may also be nice in the future if the cli documentation made mention of the API calls it is leveraging…it would make the documentation redundancy less important if we could just trace down to the “real” workers.