Is there a low resource configuration or distribution of nomad (like k3s) or can it be run on a single weaker machine?

Hey. I am a developer and looking to add some simple scheduling, scaling potential, health monitoring / self healing, and perhaps most importantly rolling updates that don’t interrupt the live service - all of that, to my simple home hobby server. My first attempt involved kubernetes, but the learning curve and cpu usage made it a bad fit for the simple requirements I have. K3s seemed like the next logical step, but in my research for alternatives, nomad stuck out as an interesting option. I have a few projects that may never need scaling and most certainly don’t need to be containerized, so the idea that I could manage something like that in Nomad was especially appealing.

So, my question is whether a smaller distribution or configuration exists that would let me get the desirable functionality I’ve described in a way that wouldn’t also starve a humble host machine. The system requirements page lists some high requirements and I did come across this discussion where someone expressed similar concern, but was hoping to get some more specific information / elaboration on the answer. I’m new to the forums and didn’t want to revive a post from a year ago so pardon me if that would be the right thing to do, but are there any specific numbers we can discuss? For example…

A nodejs server that runs a few routes, a go binary that serves another few routes, and an instance of mongodb they both connect to. Single physical machine, single instance of each of those 3 applications, no present need to scale - how much of my system resources would be consumed by keeping those 3 alive and occasionally running an update on 1 of them?

If there is software that can do this and I’m on the entirely wrong track looking at solutions like kubernetes and nomad, I would also be happy to accept that as an answer if you would also recommend such solutions :slight_smile: So that we may potentially discuss specifics for configuring nomad - I have these specs:

Intel (R) Xeon (R) CPU ES-1225 v5 @ 3.30GHZ with 64GB memory

I think that’s plenty to enable the machine to listen to a port and say “Hello, World!” but it might do more in the future. Is it reasonable to manage the updates of these small scale projects with nomad?

Hi @student020341 :wave:

There isn’t anything like k3s for Nomad because we haven’t seen the need to yet. Part of simplicity of Nomad is that the same binary can be used to deploy a small single node cluster, as well as a large scale globally distributed infrastructure in a fairly similar way.

Single node deployments are not recommended for production environments for a few reasons:

Availability

Having multiple servers allows for some of them to go down without impacting the cluster in general. This is a concern for production environments since you would want to avoid downtime as much as possible. For a homelab setting, it would probably be OK to have some downtime.

Durability

Nomad servers store the state of your cluster. The leader replicates this state across the followers for the case that, if it goes down, one of the other servers can pick up the work right away. If you only have one server and, for some, reason, it gets destroyed, you would lose this state.

Most of the state is about registered jobs, allocation status etc., which are fairly easy to recover is a small homelab setting: you would just need to run the jobs again, but some things could be quite disruptive for a production environment, like having to re-create ACL tokens.

Performance

Nomad can distribute some of the scheduling computation among all the servers. For the large clusters, this could have a positive performance impact, since there will be more CPU power doing this job. For a small scale homelab, not so much.


These 3 reasons are fairly generic, and would be true for any other type of distributed system (like Kubernetes, a database etc.).

One thing that would be Nomad-specific is that we don’t recommend running Nomad clients and servers in the same machine. Some of the reasons for this are:

Resource sharing

Nomad servers can use a lot of memory since they keep state in RAM for faster operation, and having Nomad clients + your own jobs and workloads running in the same machine will take away resources.

On large production deployments, this can amount to several GBs and so any extra byte would help. For a small homelab, with just a few jobs, the server will probably not need this much.

Security

Nomad clients run as root and execute arbitrary code via the jobs that are registered. This means that a malicious actor could register a job that can access and affect the Nomad servers if they are running in the same machine.

For a production environment, this is bad news. For a homelab that only lives in a LAN, maybe not so much.


As you can see, it’s not that it’s not possible or hard to do, it’s more about what kind of risks and trade-offs are you OK to accept :slightly_smiling_face:

4 Likes

Thanks, to some degree (as someone new to dev ops) I can understand those risks and trade offs as they relate to a production product.

My apologies for the rambling that may have obscured my concerns - is there some minimum system requirement or configuration that Nomad can operate within? I am sold on the merits of the solution and accepting of the risk of adding it to my current hobby projects, just wary of what would be left of my system based on this system requirements page. If there are any concrete numbers you would know, for example that maybe 10gb of memory is enough for Nomad to manage the 3 part application stack I mentioned above, I think that would be useful to anyone who is brought to these forums in the future by Google, just like I was recently.

I’m sorry if my question doesn’t make sense, this domain is new to me and I’m not sure if these larger orchestration tools are even what I think I need.

In my personal experience, Nomad is quite frugal when it comes to resource requirements for small home clusters. For example, I have a virtualized lab environment that has 3 servers with 512 MB and 3 clients with 1 GB of RAM. For small clusters, your required resources are almost entirely workload specific. I’ve run small clusters on Raspberry Pi 3s, AWS t2.micros, and little VMs with no issues other than self-inflicted ones around workload size.

The primary issue with single node clusters is around failure cases. Using multiple nodes, even when they are small (like RPis) enables you to handle things like rolling upgrades, single node failures, and workload migration that just can’t be available in a single node configuration.

As your workload in your small cluster increases, you might need to replace individual nodes with beefier ones. From my experience, this is almost always client nodes. I haven’t hit a place yet where I have exhausted my small server nodes.

For production workloads, you should always consult the Reference Architecture; however, for small home and toy clusters you can get away with a quite small setup.

But to your question, you can do as small as a single node cluster on a local machine. Your workload or a desire for high-availability will be the primary push to scale at the size you’re talking about.

4 Likes

Hmm, I suspected something like that might be the case based on how that discussion from 1 year ago was answered, but having it stated explicitly is great, thanks!

I would guess for my use case, 1 node could handle the work load and I have the resources to spin up a 2nd I would probably only use when I’m doing an update to prevent downtime. If I’m misunderstanding something there, please do correct my understanding :smiley:

If I ever do any serious work, I would most definitely move the work into a cloud environment like AWS. Until then, I think this answers my question. Maybe I should repeat my understanding to make sure - Nomad’s cpu/memory usage will scale up or down with the workload (?). Is there some minimum? I suppose I’ll discover that when I try to run it, but maybe it would be good to mention that on the system requirements page! K3s is happy to brag that that it will only take up 512mb of your system memory and can run on iot & edge devices.

Saying you’ve run Nomad on a Pi gives me a totally different view of it as a newcomer, compared to the system requirements page :slight_smile:

Thanks @angrycub, that’s a great data point.

In general, yes. Deployments with hundreds or thousands of clients will probably need extra CPU to handle client heartbeats and updates.

Not…really? Usually Nomad just runs :sweat_smile:

Personally, I’ve been able to run Nomad in:

  • Raspberry Pi 3
  • Raspberry Pi 4
  • VMs with random specs (I think the smallest has 1GB of RAM)
  • A Kobol Helios64 NAS

The only device that I wasn’t able to run was a Synology DS214play NAS, but that was because they use a heavily modified Linux OS, so a software issue.

From the community, we’ve seen several Raspeberry Pi clusters:

And even inside lab equipments: https://thenewstack.io/applying-workload-orchestration-to-experimental-biology/

That’s a good point. But the section you linked is from our production deployment requirements. I guess the problem for us is that we don’t formally test at small scale, so it’s hard to make any recommendations.

3 Likes

Awesome! I think I’ll stick with Nomad, then.

Probably just another misunderstanding I had of kubernetes. When I ran my simple server project inside it, I checked my server stats and saw my cpu sitting at 25%! I wondered if orchestration systems just take up an arbitrary % of available resources in case they need to suddenly do something at a moment’s notice. Even if that ends up being the case for Nomad, I think it will be the superior choice for my use cases - simplicity and some applications not being containerized.

Oops, I didn’t notice that. I think that list you posted should cover it well enough for any googlers that arrive at this post :smiley:

1 Like

I don’t know enough about Kubernetes internals, but this does seems a bit odd. Nomad doesn’t reserve any resources beforehand, so it shouldn’t use much if there’s nothing running.

For example, this is from my homelab cluster leader:

nomad@nomad-server-01:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:          7.8Gi       284Mi       3.3Gi       1.0Mi       4.2Gi       7.2Gi
Swap:         4.0Gi       0.0Ki       4.0Gi
nomad@nomad-server-01:~$ uptime
 13:54:17 up 53 days, 13:58,  1 user,  load average: 0.02, 0.03, 0.00

But it’s important to note that I only have 1 job, and it’s stopped :sweat_smile:

1 Like

One important note here. Nomad is a quorum-based system, so for high availability you will need at least 3 nodes. From the documentation about the consensus protocol:

  • Quorum - A quorum is a majority of members from a peer set: for a set of size n , quorum requires at least ⌊(n/2)+1⌋ members. For example, if there are 5 members in the peer set, we would need 3 nodes to form a quorum. If a quorum of nodes is unavailable for any reason, the cluster becomes unavailable and no new logs can be committed.

So in order for your cluster to tolerate a single node failure, you will need a peer set (of servers) of at least three members. Again, for toy clusters, these can be quite modest.

If you are willing to accept the risk of a physical node failure creating an outage, but otherwise want to be able to have some of the high-availability functions, you could run 3 small VM instances for your Nomad servers on a single host. This would let you experiment with high availability with the (enormous) caveat that the node running all of your servers becomes a single point of failure if something happens to it.

4 Likes

I still appreciate getting to see some real system stats! Witnessing that - I think something in kubernetes might just be different. Perhaps reserving resources beforehand, as you suggested. Even if k3s brings its operation closer to Nomad, I’m still certain a simpler tool will be better for my exploration of this wonderful and terrifying new domain.

I will keep that in mind, thanks. I am pretty sure I have the resources to run several versions of my stuff on the host machine. I have a tangential and potentially obvious question if you don’t mind - for the sake of high availability, would it be better (or would you recommend) to split a game server into 3 nodes that can handle 50 players each instead of having a single node with the resources to handle 150 players? Assuming these no impact on the game of splitting players into any number of servers and groups.

And that would be as you mentioned, several VMs running on 1 host machine.

Hum…that’s a good question, but also hard to answer. I think it’s all about trade-offs. Having a single big server is easier to manage and will scale quicker since it will take longer for it to be saturated. On the other hand, it would be wasteful to keep a large server running if there’s no load.

Smaller servers require more prep-work to be able to quickly provision them, and it takes some time to start one on-demand. But it provides a finer-grain control over cost since you can run only as many as you need.

Another option would be a hybrid approach. A small pool of large servers to handle your average load, and a pool of small servers that are started on-demand. This a bit more complex to setup, but could give you the good bits of both worlds.

My personal opinion would be to start small and grow as needed. Cloud gives you this flexibility to easily try and adjust as you learn more.

1 Like

@lgfa29 Is this still true today? I am planning on installing nomad/consul on 3 raspberrypi 4s - each with a nomad and consul server/client . Not advisable ?

Hi @juju

It’s entirely possible to run Nomad as Client and Server as the same agent on a single Node. It just isn’t recommended for serious production environments, where one must remain vigilant against possible attack vectors. In this case, running a Client and Server on the same Node implies that a compromised Task could potentially in turn compromise the Nomad Server (for example, by reading the server data directory on disk), in turn giving the attacker full access to the Nomad cluster.

By separating Nomad Client and Server agents onto different Nodes, that attack vector is not possible.

1 Like

It’s a little off topic but I did resolve my original issue. The biggest thing for me was figuring out how to deploy an update without much downtime and or load balance a few instances. My projects are very small, so I do not yet have need for tools like these. What I did instead was use the load balancing feature of caddy! I was already using it to reverse proxy my server to get ssl through letsencrypt. So what I did for minimally disruptive updates was just start new servers with my updated binaries and tell the other ones to tear down and let caddy manage the rest.

I am still curious about nomad for the future, but at the time my projects are far too insignificant.