We are now using distributed file system SeaweedFS through seaweedfs-csi-driver .
The csi driver will mount volume to a global mount point at NodeStageVolume
stage . And Then, at NodePublishVolume
stage, it only creates a symbolic link to the global mount point under the container directory. Finally, the container is ready to use the volume.
seaweedfs:master
← garenchan:ck-dev1
opened 03:14AM - 07 Jul 22 UTC
# What problem are we solving?
Even if multiple pods use the same volume, the d… river will fork a child process for each of them to mount the volume. This is wasteful of resources and bad for management.
# How are we solving the problem?
The idea is that pods using the same volume need to share a `weed mount` child process.
We currently solve this problem based on the [CSI specification](https://github.com/container-storage-interface/spec/blob/master/spec.md).
1. When a volume is first used on a node, `NodeStageVolume` will fork a child process to mount it to host's `/var/lib/kubelet/plugins/kubernetes.io/csi/pv/<pvc>/globalmount` directory.
2. Later when other pod uses the volume, `NodePublishVolume` just needs to create a symbolic link `/var/lib/kubelet/pods/<pod>/volumes/kubernetes.io~csi/<pvc>/mount` to the `/var/lib/kubelet/plugins/kubernetes.io/csi/pv/<pvc>/globalmount` directory.
3. If the pod using the volume on the node is deleted, then `NodeUnpublishVolume` only needs to delete the corresponding symbolic link `/var/lib/kubelet/pods/<pod>/volumes/kubernetes.io~csi/<pvc>/mount`.
4. Finally, when the volume is not used by any pods on the node, `NodeUnstageVolume` will unmount the volume.
Additionally, according to the specification, we also consider scenarios for handling concurrent calls.
https://github.com/container-storage-interface/spec/blob/master/spec.md#concurrency
# Checks
- [x] I have tested the above scenarios if possible.
All of this worked fine on K8S. But on Nomad, creating a container will fail because the volume cannot be mounted.
opened 02:30AM - 14 Jul 22 UTC
The commit from the 7/Jul/2022 "Pods using the same volume share mount" appears … to have broken the CSI driver on nomad, if I build a version prior to that commit everything works as expected, however from that commit onwards the SeaweedFS mount always fails in the target container.
Error from the job mounting the volume:
```
Driver Failure | failed to create container: API error (400): invalid mount config for type "bind": bind source path does not exist: /opt/nomad/client/csi/monolith/seaweedfs/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer
```
From the CSI Job:
```
I0714 02:04:44 1 main.go:38] connect to filer 192.168.8.50:8888,192.168.8.51:8888,192.168.8.52:8888
I0714 02:04:44 1 driver.go:50] Driver: seaweedfs-csi-driver version: 1.0.0
I0714 02:04:44 1 driver.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0714 02:04:44 1 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0714 02:04:44 1 driver.go:99] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
I0714 02:04:44 1 driver.go:99] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I0714 02:04:44 1 driver.go:110] Enabling controller service capability: CREATE_DELETE_VOLUME
I0714 02:04:44 1 driver.go:110] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0714 02:04:44 1 driver.go:110] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER
I0714 02:04:44 1 server.go:92] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0714 02:04:53 1 nodeserver.go:32] node stage volume code_server to /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53 1 mounter_seaweedfs.go:38] mounting [192.168.8.50:8888 192.168.8.51:8888 192.168.8.52:8888] /testing to /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53 1 mounter.go:39] Mounting fuse with command: weed and args: [-logtostderr=true mount -dirAutoCreate=true -umask=000 -dir=/local/csi/staging/code_server/rw-file-system-multi-node-multi-writer -collection=testing -filer=192.168.8.50:8888,192.168.8.51:8888,192.168.8.52:8888 -filer.path=/testing -cacheCapacityMB=256 -localSocket=/tmp/seaweedfs-mount-1677588823.sock -collectionQuotaMB=953 -replication=001 -concurrentWriters=32 -cacheDir=/alloc/cache_dir]
I0714 02:04:53 1 nodeserver.go:78] volume code_server successfully staged to /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53 1 nodeserver.go:87] node publish volume code_server to /local/csi/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:53 1 nodeserver.go:118] volume code_server successfully published to /local/csi/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:58 1 nodeserver.go:125] node unpublish volume code_server from /local/csi/per-alloc/102f0f75-3dc2-7ed5-4ea3-0a2588fada96/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:58 1 nodeserver.go:192] node unstage volume code_server from /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
I0714 02:04:58 1 volume.go:117] unmounting volume code_server from /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
W0714 02:04:58 1 mounter.go:66] Unable to find PID of fuse mount /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer, it must have finished already
```
The CSI driver is mounting the SeaweedFS volume to the staging folder and accessing it on the host will let me view the files from the cluster.
In the old driver the file system is mounted at:
per-alloc/e01bf906-f4e1-64e4-5360-d049dc05355c/code_server/rw-file-system-multi-node-multi-writer
However on the new it's is mounted at:
/local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
and the alloc just has a symbolic link to the mount:
per-alloc/a464a996-bb12-c2c6-4dec-4993ce31651b/code_server/rw-file-system-multi-node-multi-writer -> /local/csi/staging/code_server/rw-file-system-multi-node-multi-writer
It appears that the target container either can't follow the sym link or can't get to /local/csi/staging/.
Maybe it should use a bind mount rather than a symlink?
Can anyone help? Looking forward to your reply.