GCE Persistent Disk CSI: "csi_hook" failed: claim volumes: rpc error: Permission denied"

Hi,
I ran through the example config here and encountered this error when deploying the mysql job:

Time                       Type           Description
2020-05-05T17:02:48-04:00  Setup Failure  failed to setup alloc: pre-run hook "csi_hook" failed: claim volumes: rpc error: Permission denied
2020-05-05T17:02:48-04:00  Received       Task received by client

I am wondering if you can help me identify where the missing permission is coming from?

client logs:

May  5 21:04:19 nmd-rpzq nomad[19194]:     2020-05-05T21:04:19.041Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: Permission denied" rpc=CSIVolume.Claim server=<nomad_server_ip>:4647
May  5 21:04:19 nmd-rpzq nomad[19194]:     2020-05-05T21:04:19.041Z [ERROR] client.alloc_runner: prerun failed: alloc_id=d1ece247-eb45-1a71-f4b0-424db8701926 error="pre-run hook "csi_hook" failed: claim volumes:
rpc error: Permission denied"

Service account role permission:

title: "Google Compute Engine Persistent Disk CSI Driver Custom Roles"
description: Custom roles required for functions of the gcp-compute-persistent-disk-csi-driver
stage: ALPHA
includedPermissions:
- compute.instances.get
- compute.instances.attachDisk
- compute.instances.detachDisk
- compute.disks.get
- compute.disks.use
- iam.serviceAccounts.actAs

Hi @vincenthuynh!

It looks like this error is coming from Nomad itself and not GCP (although it’s possible the wording matches). Are you using ACLs in your Nomad cluster? If so, do you have the csi-mount-volume permission in your policy?

If you’re not using ACLs, I’d check the allocation logs for the CSI plugins to see if there’s more information available there.

Hey @tgross,
Thanks for the reply!

We do have ACLs enabled but the Anonymous policy is pretty wide-open in the environment we’re testing this in. We’re using the write policy which contains the csi-mount-volume permission.

namespace "*" {
  policy       = "write"
  capabilities = ["alloc-node-exec", "csi-register-plugin", "csi-list-volume", "csi-read-volume"]
}

agent {
  policy = "write"
}

operator {
  policy = "write"
}

quota {
  policy = "write"
}

node {
  policy = "write"
}

host_volume "*" {
  policy = "write"
}

Edit: I disabled ACLs and was able to get around this error. Please let me know if there’s something I’m missing in my policy or if I should log an issue.

Hm, that should be working for you. Yes, if you could open an issue that would be really helpful. Thanks!

Hey @tgross, I was able to get this working as well, by allowing csi-mount-volume into my Nomad anonymous policy. However, this doesn’t seem like something we should have to allow anonymously! Is there a way to properly lock this permission down without granting it to the anonymous policy?

Hi @holtwilkins! You need to set the permissions for the policy that you want to allow access. So you can set csi-mount-volume for anonymous or for whatever other more specific policy you’d like. The learn guide on ACLs shows how you might do this: https://learn.hashicorp.com/nomad/acls/create_policy

(The original issue here was that the plugin read policy wasn’t set. See https://github.com/hashicorp/nomad/issues/7927)

Thanks @tgross. We’ve been using acls for years with no issues. I’m now trying to roll out csi support, but there’s no example I could find that shows how to configure the node and controller jobs when you’re using nomad acls? So I guess, I know how to create a custom nomad acl policy that will do this, but who do I grant this policy to so that my job doesn’t get this rpc access denied when it tries to run?

We definitely could use some better docs here, but in meanwhile I can probably point you in the right direction with a bit more information. Where are you getting the permissions error? When you run the plugin job? When you register the volume? Or when you run a job that claims the volume?