Nomad drain_on_shutdown fails with "rpc error: Permission Denied" rpc=Node.UpdateDrain

I am running Nomad 1.7.5 with ACLs enabled.

When enabling drain_on_shutdown for my Nomad Clients I get the following error logs during shutdown:

==> Gracefully shutting down agent...
    2024-08-06T19:38:41.204Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: Permission denied" rpc=Node.UpdateDrain server=10.X.X.X:4647
    2024-08-06T19:38:41.204Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: Permission denied" rpc=Node.UpdateDrain server=10.x.x.x:4647
    2024-08-06T19:38:41.204Z [ERROR] agent: client leave failed: error="rpc error: Permission denied"
    2024-08-06T19:38:41.204Z [INFO]  agent: requesting shutdown
    2024-08-06T19:38:41.204Z [INFO]  client: shutting down
    2024-08-06T19:38:41.273Z [INFO]  client.plugin: shutting down plugin manager: plugin-type=device
    2024-08-06T19:38:41.273Z [INFO]  client.plugin: plugin manager finished: plugin-type=device
    2024-08-06T19:38:41.273Z [INFO]  client.plugin: shutting down plugin manager: plugin-type=driver
    2024-08-06T19:38:41.289Z [INFO]  client.plugin: plugin manager finished: plugin-type=driver
    2024-08-06T19:38:41.289Z [INFO]  client.plugin: shutting down plugin manager: plugin-type=csi
    2024-08-06T19:38:41.290Z [INFO]  client.plugin: plugin manager finished: plugin-type=csi
    2024-08-06T19:38:41.322Z [INFO]  agent: shutdown complete

As you can see they fail to drain with a “Permission Denied” error.

I have tried running this with an ACL token with an “Administrator” token:

// Administrator (global read+write) access
namespace "*" {
  policy       = "write"
  capabilities = ["alloc-node-exec", "read-logs"]
  variables {
    # this policy can read, write, list, and destroy any other variables in this namespace
    path "*" {
      capabilities = ["list", "read", "write", "destroy"]
    }
  }
}
agent {
  policy = "write"
}
operator {
  policy = "write"
}
quota {
  policy = "write"
}
node {
  policy = "write"
}
host_volume "*" {
  policy = "write"
}

As well as with a more targeted policy:

# Used to drain node
node {
  policy = "write"
}
# Used to read info about allocations
namespace "*" {
  policy = "read"
}

The Permission Denied error persists with either of these ACL tokens set in the following way:

NOMAD_TOKEN=the-token-text-here nomad agent -config /path/to/config -data /path/to/data

And these are the [relevant] parts of my Nomad Client config:

...
leave_on_interrupt = true
leave_on_terminate = true
client {
  enabled          = true
   ...
  drain_on_shutdown {
    deadline           = "15m"
    force              = false
    ignore_system_jobs = true
  }
}

What might be causing my clients to fail to drain upon receiving the SIGINT signal? Am I missing ACL permissions? I was not aware that a client self-drain required an ACL token in the first place.

The error message rpc error: Permission denied indicates that the Nomad client is attempting to perform an RPC (Remote Procedure Call) to the Nomad server but lacks the necessary permissions to complete the operation. Specifically, the Node.UpdateDrain RPC call is being denied.

Please take a look at the policy here Nomad ACL policy concepts | Nomad | HashiCorp Developer

Additionally check if the token being used it not expired using nomad acl token info <token-Accessor-ID> Ref: Nomad ACL token fundamentals | Nomad | HashiCorp Developer

@vijesh12 do you know what permissions are required to perform Node.UpdateDrain? I can’t find that operation in the docs.

Or does the use of RPC require additional permissions/configuration? For example is RPC not enabled by default for Client ↔ Server communication?