Nomad - persistent volumes on premise (what do you use?)

msirovy · November 1, 2020, 3:30pm

Hi,

I am building nomad cloud across bare-metals and I am still not sure what kind of storage I should use. My favorite is the Linstor but I didn’t find any documentation on how to connect it together. What persistent volume solution do you prefer?

Thanks for the recommendations…

Wolfsrudel · November 1, 2020, 4:17pm

I don’t know if there is a plugin for Linbits Linstor already, but

Jobs can claim storage volumes from AWS Elastic Block Storage (EBS) volumes, GCP persistent disks, Ceph, Portworx, vSphere, etc.

I think Portworx or Ceph should do it.

EDIT:

A list of available CSI plugins can be found in the Kubernetes CSI documentation. Any of these plugins should work with Nomad out of the box.

Linstor should work.

msirovy · November 1, 2020, 7:00pm

Hi Wolfsrundel,

thanks for your reply. I’ve red the list of csi documentation. I asking there for recommendation and insights from production.

Linstor seems to be fastest, but it is not well integrated with nomad so I am not sure with it still.
I can’t use AWS, Azure or GCP our servers run out of these cloud providers, we use many physical servers from multiple providers so I need provider independent solution but something robust, cheap and easy (I know how funny it sounds )

msirovy · November 3, 2020, 7:50am

No one use nomad with persistent storage? Share your experience with me, please.

Thanks in advance

josephlim75 · August 12, 2021, 3:23pm

I know is a bit late, there is post in Linstor for Nomad

Would like to know if you successfully setup Linstor in Nomad ? Would like to hear from your experience on how is the performance.

msirovy · August 18, 2021, 11:36am

Hi,

we’ve spent a while with it, I’ve been at online the linstor conference and discussed integration and support there with one of the contributors and with one moderator but no one has ever heard about the nomad and they are focused on k8s. So We tried it alone and after a while, we had a working prototype but with poor performance and it required many steps a side by nomad’s deployment via “nomad run template.hcl” (create volumes, add volume to nomad etc).
The performance was poor because linstor CSI and nomad are not able together to place jobs and data at the same node. So we’ve decided to stop wasting time with it for now and we currently use docker volumes and affinity to place containers to the same node because it is fast like a native hard drive and this way is acceptable in our current use case. But in near future, we will have to find a solution with shared storage.

Persistent storage is from my point of view the most important challenge when you use bare-metals and host hundreds of applications

henrikjohansen · August 18, 2021, 12:03pm

From my experience (running Nomad on-prem using bare metal hardware) persistent storage can be problematic

Not all CSI drivers will work without Kubernetes (NetApp is the prime example here), it does take a considerable amount of effort to go from driver to production ready CSI setup in Nomad and sometimes storage vendors will not support your solution when you use CSI drivers without Kubernetes).

For now we use host volumes (using NetApp NFS namespaces) since this gives us the most options with the least amount of trouble.

Yes, NFS is not a posix-compliant FS but it works for the stuff we run.

shantanugadgil · August 18, 2021, 4:16pm

PortWorx? No takers for PortWorx?

PortWorx also seems to have a Nomad job example.

sycured · August 21, 2021, 3:21pm

It’s my current issue at my lab… 4 nodes running the entire stack (consul, nomad, vault) as server/client

Nomad job example from Portworx requires Docker but I don’t use Docker daemon at my lab, exclusively Podman in rootless mode.
I also commented on their forums without success like I’m an alien to don’t use docker daemon…

I tried direct-csi from Minio but it requires a kubeconfig file … but minio running directly in distributed mode on bare-metal was without issue and without nomad (systemd was sufficient)

@msirovy I see GitHub - piraeusdatastore/linstor-csi: CSI plugin for LINSTOR
If you find about linstor, please share it… it’s incredible how we’re in storage issue due to lack of compatibility with nomad (and/or podman)

The only solution that I see for my lab is crazy:

removing unused disks from nodes
finding unused NAS or buying a new one
hoping that I can reuse disks (SAS 2.5") from nodes with the NAS
connecting all nodes to the NAS using NFS or better … iSCSI

drdukes · September 15, 2021, 3:47am

For on-prem solutions I installed systems that use a NetApp San with exported NFS4 shares.

Currently in my lab, I’m using a TrueNAS instance to serve storage while utilizing CSI for Nomad. This way Nomad can manage the volumes and TrueNAS can manage the associated dataset and perform snapshots/backups/restores seamlessly (in my case to Backblaze B2).

that_man · March 26, 2022, 5:20am

This week I tested Nomad v1.2.6 with NetApp SolidFire 12.3.
I got Host Volumes and Trident Docker Volume Plugin to work. This wouldn’t work great for most Nomad-on-bare-metal environments because there’s no way to initiate volume failover from Nomad (i.e. with the help of CSI), so one would have to build HA clusters to failover storage resources for Nomad (too much trouble, but on the other hand a mature HA failover solution might work better than CSI at this stage of Nomad CSI development)…
If you can use VMs and get VM-level HA from a hypervisor and consume storage that way, then that would work.

Without a CSI driver, static shares such as NFS are easier to use.

NegativeFeedback · March 26, 2022, 6:26am

I have had semi good luck with using ceph. the documentation of working in nomad is light but doable. performance looks great and the ceph part of redundancy is great. data has survived terrible things during testing. What has not worked as well from a redundancy part in job re scheduling after node failure. volumes will get stuck randomly and need to be detached at best or node cycle/rbd unmap when nomad gets super confused. A lot of my issues seem to have disappeared on my third cluster iteration.
5 nomad workers, ceph osd running workers

CarbonCollins · March 26, 2022, 8:42am

I currently have three types of persistent storage on my stack.

The first are host volumes within a ZFS mirrored pool, it only provides redundancy if one of the disks fail but for jobs that are tied to that machine either though physical hardware or because I have not moved them to the second type, it works relatively well, managing the host volumes can be a bit tedious though so using Ansible helps in that regard

The second option which is where I am moving toward is CSI volumes using Democratic CSI to mount persisted folders from my NAS over SMB. Experience wise its been a little rocky (bare in mind CSI is still beta in nomad right now) I used to get plagued by Issue #10927, however since the work the nomad team have done regarding CSI recently I’ve been having a significantly better experience with it I’m still having issues with getting SMB credentials passed into the volumes (I’m currently creating CSI volumes manually) but I do have other issues open in the Nomad terraform provider to hopefully alleviate that issue at some point Outside of these two main issues I have been having its been really nice, jobs can move between nodes even if they are stateful and require persisted storage.

The third (A bit of a mix between the first two) are SMB shares that are mounted directly to the node and a host volume created for the contents within. I’ve not had much issues with this one bar when the NAS is offline I mainly use this method if I have lots of jobs wanting to access the same data to read (I am however thinking of moving this to a CSI volume at some point though)

NegativeFeedback · March 27, 2022, 12:24am

The third (A bit of a mix between the first two) are SMB shares that are mounted directly to the node and a host volume created for the contents within. I’ve not had much issues with this one bar when the NAS is offline I mainly use this method if I have lots of jobs wanting to access the same data to read (I am however thinking of moving this to a CSI volume at some point though)

I have a few ceph csi volumes like that, think acme certs and media for the plex stack.

I have gotten away with using block devices and multi-node-multi-writer but it feels wrong and only really trusted because of the low amounts of writes

Topic		Replies	Views
Which storage plugins are recommended? Nomad	12	1167	September 13, 2024
Looking for CSI recommendation that is not backed by Kubernetes Nomad	0	320	March 4, 2021
New Guides for Nomad beta v0.11.0! Nomad task-dependencies , csi , jobs	8	1331	May 11, 2022
Stateful containers with nomad on vmware Nomad	0	300	June 27, 2022
Nomad persistent storage and nfs Nomad	3	1225	September 9, 2021

Nomad - persistent volumes on premise (what do you use?)

Related topics