I am running Nomad 1.7.5 with ACLs enabled.
When enabling drain_on_shutdown
for my Nomad Clients I get the following error logs during shutdown:
==> Gracefully shutting down agent...
2024-08-06T19:38:41.204Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: Permission denied" rpc=Node.UpdateDrain server=10.X.X.X:4647
2024-08-06T19:38:41.204Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: Permission denied" rpc=Node.UpdateDrain server=10.x.x.x:4647
2024-08-06T19:38:41.204Z [ERROR] agent: client leave failed: error="rpc error: Permission denied"
2024-08-06T19:38:41.204Z [INFO] agent: requesting shutdown
2024-08-06T19:38:41.204Z [INFO] client: shutting down
2024-08-06T19:38:41.273Z [INFO] client.plugin: shutting down plugin manager: plugin-type=device
2024-08-06T19:38:41.273Z [INFO] client.plugin: plugin manager finished: plugin-type=device
2024-08-06T19:38:41.273Z [INFO] client.plugin: shutting down plugin manager: plugin-type=driver
2024-08-06T19:38:41.289Z [INFO] client.plugin: plugin manager finished: plugin-type=driver
2024-08-06T19:38:41.289Z [INFO] client.plugin: shutting down plugin manager: plugin-type=csi
2024-08-06T19:38:41.290Z [INFO] client.plugin: plugin manager finished: plugin-type=csi
2024-08-06T19:38:41.322Z [INFO] agent: shutdown complete
As you can see they fail to drain with a “Permission Denied” error.
I have tried running this with an ACL token with an “Administrator” token:
// Administrator (global read+write) access
namespace "*" {
policy = "write"
capabilities = ["alloc-node-exec", "read-logs"]
variables {
# this policy can read, write, list, and destroy any other variables in this namespace
path "*" {
capabilities = ["list", "read", "write", "destroy"]
}
}
}
agent {
policy = "write"
}
operator {
policy = "write"
}
quota {
policy = "write"
}
node {
policy = "write"
}
host_volume "*" {
policy = "write"
}
As well as with a more targeted policy:
# Used to drain node
node {
policy = "write"
}
# Used to read info about allocations
namespace "*" {
policy = "read"
}
The Permission Denied error persists with either of these ACL tokens set in the following way:
NOMAD_TOKEN=the-token-text-here nomad agent -config /path/to/config -data /path/to/data
And these are the [relevant] parts of my Nomad Client config:
...
leave_on_interrupt = true
leave_on_terminate = true
client {
enabled = true
...
drain_on_shutdown {
deadline = "15m"
force = false
ignore_system_jobs = true
}
}
What might be causing my clients to fail to drain upon receiving the SIGINT signal? Am I missing ACL permissions? I was not aware that a client self-drain required an ACL token in the first place.