I’m just getting started with Nomad but so far I’m very excited.
In the lifecycle of software it usually starts out as a manually provisioned VM, then scaling up to a few machines then moving to Kubernetes (or now that I know about it, Nomad).
Currently we’re at the 2nd stage and we are running fully containerized using Rancher 1 as the orchestrator.
The simplicity of nomad and the recent focus on removing the need for Consul would allow it to fill this medium size gap; however as mentioned in other issues there’s no testing done at small scale.
Unfortunately for us we do not have the expertise to do actual testing and since the Nomad team is (I think) focusing on simpler deployments with less moving parts I’d argue that it makes sense to test for a few smaller scale scenarios as well.
Note that all of this is about production environments at small scale (in other posts the implicit assumption of people responding always seems to be that small scale implies non-prod).
Running a Nomad node in client & server mode.
This seems to be the best way to get started quickly; the argument against it seems to be that there might be resource contention, but I feel this can be managed by proper configuration if properly documented.
Latency between Nomad server nodes
The allowed latency for the raft protocol is 10ms, but what happens if I run a cluster with 3 nodes in different datacenters, with a <30ms latency?
I’d be great to know that this will still be fine for cases where there’s say < 200 containers and probably more importantly: less than 5 new job submissions / hour.
I’m very excited to move over our production system, but even after our own testing I cannot be confident that the system will hold up if deployed like I mentioned.
Deploying on a full scale Nomad + Consul cluster is not realistic for our size.
How can I help in making Nomad more accessible for smaller workloads
I think Nomad does sound like broadly a good fit given when you’ve described.
Re question 1 - Running in mixed client/server mode seems like a fine option to me. I would just reserve some CPU and memory for the Nomad “server” portion of the agent in the client’s config. https://developer.hashicorp.com/nomad/docs/configuration/client#reserved
Re question 2 - Latency between clients and servers is no big deal. There is some heartbeat config on the clients and a max_client_disconnect value you can use to make sure Nomad is fine with this. But… latency between servers is a big deal. You could probably initially deploy servers in different datacenters and get away with it, but it would be a brittle system. We suggest putting servers in different availability zones within the same region. That will allow latency to be low while still having a fair bit of resilience.
Good luck with everything!
Thanks for the quick response!
I’m in the process of setting up the cluster with 1 server “far” away at 25 ms. Then will migrate it to a more local dc, it’ll allow me to practice this kind of operation as well.
While reading docs I found this setting: https://developer.hashicorp.com/nomad/docs/configuration/server#raft_multiplier
To me this seems like the exact thing I’d need though. Of course it has drawbacks as clearly state, but to me those seem like situation where one could make an analysis of the pros and cons.
Got it all working now; server nodes binding to a private network address, clients having a “public” network defined which can be used by a system LB job etc.
Edit: this was partially implemented in 1.2.4, the http server now supports multiple address binding. This solves my use case.
I’m wondering if there’s a reason not to allow binding the agent to multiple addresses. Say my server has the following IPs:
- private ip 192.168.18.3
- public ip x.y.z.z
- local ip 127.0.0.1
- docker ip 172.17.0.1
It’s not unreasonable to want to bind the server to all but the public interface.