Hi all,
playing around with Nomad in a home environment, and I’m running into issues with CSI.
I have a Synology DS220+ NAS in my network and would like to run all essential services there. Since it’s a “natural” single point of failure, I’m fine with my cluster going down if the NAS is down. All data resides there anyway.
Anyway, when I try to deploy a CSI controller job to the Nomad client/server running on the NAS, the job get’s killed after about half a minute. Tried a few NFS and SMB CSIs, but all show basically the same behavior.
I found the following messages in the log:
Mar 26 13:52:49 storage nomad[17771]: 2023-03-26T13:52:49.687+0200 [WARN] client.alloc_runner.task_runner.task_hook.api: error creating task api socket: alloc_id=4969158d-6045-297a-a770-89b47d94e21f task=synology-csi-plugin path=/volume1/homelab/nomad/var/lib/nomad/alloc/4969158d-6045-297a-a770-89b47d94e21f/synology-csi-plugin/secrets/api.sock error="listen unix /volume1/homelab/nomad/var/lib/nomad/alloc/4969158d-6045-297a-a770-89b47d94e21f/synology-csi-plugin/secrets/api.sock: bind: invalid argument"
Mar 26 13:53:41 storage nomad[17771]: 2023-03-26T13:53:41.634+0200 [ERROR] client.alloc_runner.task_runner.task_hook: killing task because plugin failed: alloc_id=4969158d-6045-297a-a770-89b47d94e21f task=synology-csi-plugin error="CSI plugin failed probe: timeout while connecting to gRPC socket: failed to stat socket: stat /volume1/homelab/nomad/var/lib/nomad/client/csi/plugins/4969158d-6045-297a-a770-89b47d94e21f/csi.sock: no such file or directory"
Mar 26 13:53:41 storage nomad[17771]: 2023-03-26T13:53:41.634+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=4969158d-6045-297a-a770-89b47d94e21f task=synology-csi-plugin type="Plugin became unhealthy" msg="Error: CSI plugin failed probe: timeout while connecting to gRPC socket: failed to stat socket: stat /volume1/homelab/nomad/var/lib/nomad/client/csi/plugins/4969158d-6045-297a-a770-89b47d94e21f/csi.sock: no such file or directory" failed=false
Mar 26 13:53:41 storage nomad[17771]: 2023-03-26T13:53:41.886+0200 [INFO] client.alloc_runner.task_runner: Task event: alloc_id=4969158d-6045-297a-a770-89b47d94e21f task=synology-csi-plugin type=Killing msg="CSI plugin did not become healthy before configured 30s health timeout" failed=true
Mar 26 13:53:47 storage nomad[17771]: 2023-03-26T13:53:47.890+0200 [ERROR] client.alloc_runner.task_runner.task_hook: failed to kill task: alloc_id=4969158d-6045-297a-a770-89b47d94e21f task=synology-csi-plugin kill_reason="CSI plugin failed probe: timeout while connecting to gRPC socket: failed to stat socket: stat /volume1/homelab/nomad/var/lib/nomad/client/csi/plugins/4969158d-6045-297a-a770-89b47d94e21f/csi.sock: no such file or directory" error="context canceled"
I think the important part is in the first message: “bind: invalid argument”. Looks to me like the CSI gRPC socket could no be created.
Any idea what might cause that error?
The Syno is running a rather old version of Linux
“Linux storage 4.4.180+ #42962 SMP Tue Jan 31 23:18:09 CST 2023 x86_64 GNU/Linux synology_geminilake_220+”
Nomad is running as root already, shouldn’t be a permission issue.
Any pointers greatly appreciated.