What cluster size justifies nomad+consul?


I am trying to understand how all the pieces are fitting together. I am somewhat confused by the example given.

Nomads example shows 3 servers and 3 clients. Just looking at this, it doesn’t seem to be effective. If I have 6 servers, I don’t want to use half of them for controlling purposes. I feel like this starts only to make sense with a larger number of clients, given the fact that you need minimum 3 servers for proper leader election and HA.

Then If I wanted to use consul alongside, I would probably need 2-3 servers more. Meaning now I had 5-6 server with no production code running and 3 servers with production code running. Again, this doesn’t seem to be efficient.

So I wonder what number of clients would justify running nomad and consul as per recommendation?

Hi @bluebrown the diagrams on HashiCorp’s website are suitable for PROD environments.

For experiments (and to get familiar with) nomad+Consul clusters, in my opinion, you can start with a single server node. This single server node can be running the Consul Server and Nomad Server.

Again, what I am saying is for non-PROD. :slight_smile:

A decently sized server running Consul server and Nomad server (2 CPU 4 GB)/(4 CPU 8 GB) with adequately fast storage, should be able to cater to 50-100 client (compute) nodes, based on the amount of jobs you plan to run/experiment.

Once you are done experimenting, then you can expand the servers to 3 (still running Nomad Server and Consul Server on each server node).
If you then feel that the servers are not able to handle the load, then you can separate the Nomad Servers and the Consul Servers.

*** All of the above is strictly my opinion :grinning:


Hi Shanti,

thanks for your reply, However, I feel like you did not get to the point. I know i can run nomad on a single server, that is not the question. The question is about a production cluster and what size would justify to sacrifice minium 3 (bigger) servers. A total number of 6 servers is not enough in my opinion but who am I to give such a statement.

If iI tell my boss I need a server, he will ask why and what is the cost. I don’t think I can explain to him that I need 6 to 9 servers to run prod code on only 3.

So I think it should be at least something like 10-20 clients for 3 nomad and 2-3 consul server.

Disclaimer: I do not work for HashiCorp, so I was clarifying that everything is only my opinion and not the recommended practice as per docs.

You should choose what you are comfortable with, for your needs, technical and monetary.

From the original post it wasn’t clear where the confusion was, and hence was trying to show a step-by-step way.

For pricing, etc, it would be better to follow up with the HashiCorp Team.

Shantanu Gadgil

@bluebrown Run nomad clients on your nomad servers. :slight_smile:

This is not a good setup, but it’s a great one to get buy-in, and get everything started/running! Just manually set your cpu/memory limits lower than on the server+client mixes to allow the server daemons to run without issues.

After everyone loves Nomad/Consul and you can add Vault into the mix you’ll be able to justify more hardware no sweat!

I would tend to discourage colocating Nomad servers and Clients except for in the most resource constrained proof of concept setups. Running your application workload on the same nodes as your Nomad servers could unexpectedly deprive them of necessary resources.

If I were looking at colocation of functions for a non-production cluster, I would consider colocating Vault, Nomad servers, and Consul servers, but keeping Nomad client nodes separate. While you will get some disk competition between Consul, Vault (if using integrated storage or more Consul activity if using Consul for storage), and Nomad, an errant workload will not have the possibility to create issues for the servers themselves.

Going below the recommended layout introduces more risk in failure cases, and can introduce cases that might be harder to debug. For example, using a Consul server as the Consul agent that Nomad talks to prevents that Nomad node from switching to a healthy instance if Consul itself becomes unwell. When you are using dedicated Consul agents (not ones running as servers), the client can potentially switch to a healthy instance in these cases

In my opinion, it is better to sacrifice power on your compute nodes than count for small deployments. For example, I have a lab of virtual machines that I run Nomad on for testing and development. It’s 12 instances; however, all of my Vault and Nomad servers run on 512mb of RAM and 1 VCPU. I have more RAM for Consul because I use it to back Vault, but I still only provide it with 1GB. Finally, my clients are scaled for some larger workloads so they have more vCPUs and RAM.

There are many opinions out there about how you can run smaller clusters; however, the way we describe fault tolerance and high availability around node failures is predicated on reducing the blast radius by minimizing the numbers of colocated services.