Intermittent I/O Errors While Using NFS CSI plugin

nopersonalspace · February 12, 2024, 6:35pm

I’ve been trying to run a nomad cluster with a NAS as the storage backend for it (meaning no local volumes on any of the servers/nodes) and I’m currently using the nfs csi driver but I’ve been having issues. I’ve been seeing jobs emitting I/O errors like IOError: SQLitePCL.pretty.SQLiteException: disk I/O error intermittently, and it seems like it has something to do with the nfs csi plugin. When I re-start the allocation it always goes away no problem.

Has anyone else using the nfs csi plugin had issues like this? Is there another way to use nfs, or should I abandon it and try smb (I’ve always had better luck with SMB in the past, honestly) or something else?

nopersonalspace · February 16, 2024, 4:49pm

I’m seeing this error sometimes, which I’m assuming is related: stale NFS file handle

shaun.coss · February 16, 2024, 5:04pm

I would avoid using that particular driver. I would recommend the Democratic CSI driver.

nopersonalspace · February 16, 2024, 5:10pm

Okay, thanks. Yeah it seems like the driver is faulty. I’ll switch over to Democratic CSI and see if that helps.

nopersonalspace · February 27, 2024, 9:09pm

Does anyone use democratic-csi for nfs and have an example config to share? I can’t get this working and the documentation is pretty obtuse…

benvanstaveren · March 12, 2024, 12:05pm

It’s not necessarily an issue with the CSI driver, after all the underlying system is NFS so you may have issues with connectivity to your NAS.

nopersonalspace · March 26, 2024, 7:39pm

I guess it’s possible, but I don’t know what those issues would be. I have the NAS connected to various other machines (via SMB) and have not noticed any connectivity issues, although none of those machines are needing the same sort of persistent connection that Nomad does.

None of the other machines use NFS though, so like I mentioned earlier maybe I can just try SMB. I was trying to write a template that would work for SMB but I was having trouble getting it to work.

nopersonalspace · March 26, 2024, 7:45pm

I’m using Nix to configure the machines in the cluster running nomad, so another option I’ve considered is that I could just mount the SMB/NFS shares to the host machine and then use docker volume mounts to access them that way.

Is that advisable? Would I run into issues with all of the containers trying to access the same SMB/NFS filesystem at once?

benvanstaveren · March 26, 2024, 8:04pm

The better way to do that is to mount them to the node(s), and set up host volumes - that way if you ever switch back from host volume to CSI it’s easy, and it works for non-docker drivers too.

And it won’t be a problem with access, one of our legacy bits at $work uses CSI NFS (with the occasional same issues, hence my comment) and we have about 60 apps across 9 nodes reading/writing. We’ve never had write or read contention, only outright “the NFS isn’t responding” type errors.

nopersonalspace · March 26, 2024, 8:18pm

Interesting, by host volumes you mean this? That seems like it could work, but do you by chance know if there is a way to specify a sub-directory when using these in a job? I’d rather not have to make a separate host_volume for each job because I have to re-boot the node to apply changes to the client config each time.

benvanstaveren · March 26, 2024, 8:35pm

Yep, those As far as I know you can reload Nomad and it’ll pick them up as well so no full restart required on the client node. You can’t directly mount a subdirectory, but you can do something (which we have done) like this:

host_volume "job-storage" {
  path = "/mounted/over/nfs"
  read_only = false
}

Then in a job you declare and mount:

volume "storage" {
  type = "host"
  read_only = false
  source = "job-storage"
}

...
...
...

volume_mount {
  volume = "storage"
  destination = "/storage"
  propagation_mode = "host-to-task"
}

And your jobs then need to create their own appropriate subdirectory under /storage - assuming, of course, there is some form of unique identifier for a single run of the job (ID from a database, parameters, or something similar like that).

Since host volumes (and NFS/SMB) have absolutely no problem with multiple readers/writers, it should “just work™” - but YMMV on that Alternatively go with the direct docker volume mount and skip the host volumes. The reason I suggested host volumes is more to make the underlying mechanism opaque to the job so it becomes eas(y|ier) to change in the future if need be.

Topic		Replies	Views
CSI Volume controller & volume issues Nomad	0	1260	October 12, 2021
RESOLVED: democratic-CSI plugin and nfs mounting fail (connect timeout) Nomad csi	1	2370	January 23, 2021
NFS on Nomad via CSI and csi-driver-nfs Nomad csi	8	5291	January 14, 2021
CSI on Windows Nomad nodes Nomad csi , windows	2	284	February 13, 2024
Corrupt Databases and Storage Nomad	2	474	January 22, 2023

Intermittent I/O Errors While Using NFS CSI plugin

Related topics