The Nomad team is currently exploring how we can better support single-node Nomad deployments. These would be deployments that are running the Nomad agent as both a server and a client simultaneously.
We want to support this use-case for Nomad (with some clearly communicated caveats around HA and noisy neighbors), but we want to make sure to put the proper technical guard-rails in place when we do. This would allow for simpler and less expensive Nomad deployments for test/dev “clusters” or for very cost-conscious users in prod.
Does anybody have feedback on this sort of use? Has anybody run into issues specific to mixed-use Nomad agents? Are there any technical guardrails you would like to see to enable this?
Similarly, if anybody has run a small cluster of mixed agents (i.e. 3 nodes running both servers and clients) and has thoughts, we would be interested to hear feedback.
I have been running something like that (3 to 4 clients, all in server&client mode) for around 6 months. For personal use only, but I think I might go with Nomad as the next infrastructure solution for my cost-aware client (which wouldn’t benefit from Kubernetes and sill would like to benefit from the modern architecture approach).
Technically, I’m not a devops/sysops person, but the whole experience was pretty smooth so far. I ran into different problems related to CSI volumes, but in regards to running nodes in a multi-mode - I can’t say I ran into any issues so far. I have allocated some resources on each node only for Nomad to avoid running into weird problems (like processes suddenly OOMing) and have some systemd services in place for draining node upon shutdown/ensuring nomad reboots in case of a crash.
Aside that I keep everything really simple, rely on internal networking provided for hosting provider (so no proper isolation between services is happening, but I chose to accept that bargain) and unfortunately don’t have suggestions how to improve general experience as it was pretty great thus far overall :).
What I am worried a little bit tho is fragmentation around Nomad’s ecosystem (levant vs nomad-packs - both are kinda not production-grade ready) - but I mentioned that in another topic
Hey @rwojsznis appreciate the feedback and glad that mixed clients are going well for you. The processes you put in place seem like the right ones. If you haven’t already, I’d also set some reserved memory and cpu in the client blocks in the agent config - client Stanza - Agent Configuration | Nomad by HashiCorp (this might be what you were referring to though!)
Something small to note: If your combined agents are clustering together, I would avoid having 4 servers in a normal state. An even number of server nodes (outside of the case of a temporary failure) can make the raft cluster unhappy. If each client is isolated from the others though, then any number is fine :).
(Regarding Pack v Levant, you aren’t alone in feeling this way. I think I am partially to blame for this! We slow-rolled Pack out a bit too much just to make sure the interest was there before going all in on it. The interest from the community definitely is there, but we’ve got to close out 1.4 (which should be a great release!) on our end before really circling back to make it production-grade. So, acknowledged that we’re in a bit of a weird spot right now. It’s something we’re aware of and will fix, but it’ll take a little bit of time.)
Cloud native, but not K8s
In my case it’s part hobby project, part design study for edge deployment. In terms of cost, this is both dollar cost and resource consumption cost, my environment is:
2 Raspberry Pi 4
2 Raspberry Pi 3b
4 Raspberry pi zeros
7 Compute modules
wired network to all except pi-zeros
Dedicating Nomad and Consul to the 3+1 Pi3/4 wasn’t an option, so they have run in both server and agent for over a year. I have had some strange behaviour (agents losing their jobs, servers losing quorum, etc, but I think this is attributable to the test environment which has high temperature variation… so - I’m pinning it on hardware). Nomad itself handled failure very well in most cases – servers could die and come back, and agents on that node would recover their jobs and carry on happily.
The hard part, at least for me, was understanding what config goes where, from the documentation. Not that it is badly written, I would have benefitted from a clearer distinction between client and server configuration.
My environment is Vault + Consul + Nomad in that order all on the hardware above (the pi-zeros really come in handy as members of a vault cluster). Since Nomad now supports native service discovery, I’m thinking about experimenting with Consul to recover some resources, but in my experience it has worked really well.
In terms of guardrails, it would be nice for Nomad to be aware of what’s running underneath it. I often had jobs fail due to conflicting resource allocation requests which Nomad thought it was able to fulfill, but which where OOM killed when they came into conflcit with e.g. consul running on the same node.
sorry for resurrecting an older thread, but I would also be very interested in clients and servers running on the same machine being supported. My reasons are similar to @brucellino1’s: Having HA without having to dedicate three hosts solely to that.
I’m currently running a single server cluster with the Nomad/Consul/Vault servers co-located on a single host. The big downside I’m seeing here: Whenever I want to update that host, I have to first take down the entire cluster.
At the same time, I’m planning to move away from a single physical server to multiple small machines so that I can do individual updates and reboots.
Needing three physical servers which only host the Nomad/Consul servers seems like a waste, especially considering that the current server host idles at 98% most of the time.
In short: Yes, I believe officially supporting running client and server on the same node is a great idea, especially for smaller setups.
Also a question to @brucellino1 and @rwojsznis if I may: How are you running the server/client on the same node? With the -dev flag and a single agent? With a single agent and both the client and server configs in the same config file? Or with two agents using different ports?
Having HA without having to dedicate three hosts solely to that.
Just want to note that this would only be “HA” at the application level. For instance, if you have two allocations/instances of an app running and one dies due to code failure, then single-server Nomad would keep it up and healthy. But of course you aren’t HA in the case of VM failure. Probably obvious, but just wanted to clarify in case!
How are you running the server/client on the same node? With the -dev flag and a single agent? With a single agent and both the client and server configs in the same config file?
If you want to run real workloads on the same node, don’t use -dev, as it won’t save your data and it turns of ACLs. I would use a single agent with both client & server configs in the same file, but I don’t think there’s a reason you couldn’t use to agents with different ports.
Yes, that was clear. My main motivation for multiple servers is to have the Nomad cluster itself HA. What I want to get rid of is the need to take down all of my jobs when I do maintenance on the single Nomad server host. Once I’ve got my Pi cluster set up, I can do a “node -drain” dance with the Nomad server nodes and restart them one after the other while my cluster and all my jobs stay up, save for the short interruption when nodes get drained.
Overkill for a Homelab? Absolutely. But taking down everything (in the right order) when doing OS updates was getting really annoying.
@mnomitch Having a single node operation mode officially supported would be a huge benefit for me. For those of us that like to run our own hosted VMs, but just need the basics to deploy personal projects, Nomad is the best option out there.
I have no need for running a quorum of servers, as I only run a server on one VM, and a single client on another VM.
The 1.4 release with variables, and the direct integration with Traefik for service discovery, has made Nomad incredibly useful for small deployments. I’m really enjoying the focus of meeting the needs of small deployments that don’t want to step into the world of kubernetes or some “serverless” hosted solution.
A single node option would be amazing for slowly adopting nomad. I’m even exploring nomad after coming from a k3s setup because of the sheer weight of even lightweight k3s.
It opens up a new feature, solution deployment ‘interfaces’, we have this in applications in containers, but you need to glue these together to make a solution. This can be k8s yaml, or nomad jobs or even docker compose…
None of these are compatible though, so you end up having to switch. Same sort of solution but don’t need HA and to run 5 nodes? Better write some systemd unit files or docker compose. Now have the business need to scale?, Ah well you’ve now go to go rewrite into k8s or nomad jobs.
It would be amazing to start with single nodes, build up larger installations, and just ‘shift’ the jobs to the newer instance. This would be great for the service mesh too. If there is a remote workload, it’s still all in mesh, even it it’s some tiny IOT device, just modelled as a separate data entre.
FWIW I’ve had an excellent experience so far with running single-node Nomad using the “unofficial” setup of an agent as both a server and client simultaneously. The use-case is many on-prem/edge metal hosts for SDN/CNF and IoT related workloads. Ideally they can eventually scale horizontally with clusters as needed, and being able to use similar infrastructure in the cloud is a big win.
I too found k3s and other lightweight k8s too burdensome for the edge.
I guess it would be nice to have some “official” recommendations on combined server/client configurations.
Looking for cost effective/modern solution for deploying/maintaining single-node setups for small things, ideally with ability to easily scale this setup when the need arises. Example: MVPs, POCs, even just simple landing pages.
Not happy with k3s at all, as it eats around 700mb for doing nothing + in general kubernetes looks like to much over-engineering here.
I have 3 nomad servers that are also running as clients. I run only fabio proxy and promtail on them, to have log collection and nice URLs. They are registered in nomad in a separate datacenter to make sure nothing else will get scheduled on them.
I have overall good experience with that. However, restarting a server with it being also a client may be unpleasant, especially in case of configuration errors i.e. longer downtime the allocations become stuck of something. What I recommend is before restarting, first drain it and make sure nothing is running on it.