I am building nomad cloud across bare-metals and I am still not sure what kind of storage I should use. My favorite is the Linstor but I didn’t find any documentation on how to connect it together. What persistent volume solution do you prefer?
thanks for your reply. I’ve red the list of csi documentation. I asking there for recommendation and insights from production.
Linstor seems to be fastest, but it is not well integrated with nomad so I am not sure with it still.
I can’t use AWS, Azure or GCP our servers run out of these cloud providers, we use many physical servers from multiple providers so I need provider independent solution but something robust, cheap and easy (I know how funny it sounds )
we’ve spent a while with it, I’ve been at online the linstor conference and discussed integration and support there with one of the contributors and with one moderator but no one has ever heard about the nomad and they are focused on k8s. So We tried it alone and after a while, we had a working prototype but with poor performance and it required many steps a side by nomad’s deployment via “nomad run template.hcl” (create volumes, add volume to nomad etc).
The performance was poor because linstor CSI and nomad are not able together to place jobs and data at the same node. So we’ve decided to stop wasting time with it for now and we currently use docker volumes and affinity to place containers to the same node because it is fast like a native hard drive and this way is acceptable in our current use case. But in near future, we will have to find a solution with shared storage.
Persistent storage is from my point of view the most important challenge when you use bare-metals and host hundreds of applications
From my experience (running Nomad on-prem using bare metal hardware) persistent storage can be problematic
Not all CSI drivers will work without Kubernetes (NetApp is the prime example here), it does take a considerable amount of effort to go from driver to production ready CSI setup in Nomad and sometimes storage vendors will not support your solution when you use CSI drivers without Kubernetes).
For now we use host volumes (using NetApp NFS namespaces) since this gives us the most options with the least amount of trouble.
Yes, NFS is not a posix-compliant FS but it works for the stuff we run.
It’s my current issue at my lab… 4 nodes running the entire stack (consul, nomad, vault) as server/client
Nomad job example from Portworx requires Docker but I don’t use Docker daemon at my lab, exclusively Podman in rootless mode.
I also commented on their forums without success like I’m an alien to don’t use docker daemon…
I tried direct-csi from Minio but it requires a kubeconfig file … but minio running directly in distributed mode on bare-metal was without issue and without nomad (systemd was sufficient)
For on-prem solutions I installed systems that use a NetApp San with exported NFS4 shares.
Currently in my lab, I’m using a TrueNAS instance to serve storage while utilizing CSI for Nomad. This way Nomad can manage the volumes and TrueNAS can manage the associated dataset and perform snapshots/backups/restores seamlessly (in my case to Backblaze B2).
This week I tested Nomad v1.2.6 with NetApp SolidFire 12.3.
I got Host Volumes and Trident Docker Volume Plugin to work. This wouldn’t work great for most Nomad-on-bare-metal environments because there’s no way to initiate volume failover from Nomad (i.e. with the help of CSI), so one would have to build HA clusters to failover storage resources for Nomad (too much trouble, but on the other hand a mature HA failover solution might work better than CSI at this stage of Nomad CSI development)…
If you can use VMs and get VM-level HA from a hypervisor and consume storage that way, then that would work.
Without a CSI driver, static shares such as NFS are easier to use.
I have had semi good luck with using ceph. the documentation of working in nomad is light but doable. performance looks great and the ceph part of redundancy is great. data has survived terrible things during testing. What has not worked as well from a redundancy part in job re scheduling after node failure. volumes will get stuck randomly and need to be detached at best or node cycle/rbd unmap when nomad gets super confused. A lot of my issues seem to have disappeared on my third cluster iteration.
5 nomad workers, ceph osd running workers
I currently have three types of persistent storage on my stack.
The first are host volumes within a ZFS mirrored pool, it only provides redundancy if one of the disks fail but for jobs that are tied to that machine either though physical hardware or because I have not moved them to the second type, it works relatively well, managing the host volumes can be a bit tedious though so using Ansible helps in that regard
The second option which is where I am moving toward is CSI volumes using Democratic CSI to mount persisted folders from my NAS over SMB. Experience wise its been a little rocky (bare in mind CSI is still beta in nomad right now) I used to get plagued by Issue #10927, however since the work the nomad team have done regarding CSI recently I’ve been having a significantly better experience with it I’m still having issues with getting SMB credentials passed into the volumes (I’m currently creating CSI volumes manually) but I do have other issues open in the Nomad terraform provider to hopefully alleviate that issue at some point Outside of these two main issues I have been having its been really nice, jobs can move between nodes even if they are stateful and require persisted storage.
The third (A bit of a mix between the first two) are SMB shares that are mounted directly to the node and a host volume created for the contents within. I’ve not had much issues with this one bar when the NAS is offline I mainly use this method if I have lots of jobs wanting to access the same data to read (I am however thinking of moving this to a CSI volume at some point though)
The third (A bit of a mix between the first two) are SMB shares that are mounted directly to the node and a host volume created for the contents within. I’ve not had much issues with this one bar when the NAS is offline I mainly use this method if I have lots of jobs wanting to access the same data to read (I am however thinking of moving this to a CSI volume at some point though)
I have a few ceph csi volumes like that, think acme certs and media for the plex stack.
I have gotten away with using block devices and multi-node-multi-writer but it feels wrong and only really trusted because of the low amounts of writes