My question is similar to the following, but I’d like to get into the details of the failure modes of Nomad and how they would impact a running cluster and the following recovery:
- Nomad Minimal Hardware Requirements
- Is there a low resource configuration or distribution of nomad (like k3s) or can it be run on a single weaker machine?
Context: I’m trying to setup nomad in a production environment, which is budget constrained. The primary reason for using nomad is to have an auto-scaling cloud setup, so that we are paying only for the compute capacity that we actually need. This also means that we don’t have the budget to have 3x 8GB cloud instances for just the nomad servers. Consider this is “cloud-bill golfing” at it’s finest (just like “code-golfing”).
Questions:
- Can nomad work on a 2vCPU/2GB/20GB single cloud server for extended periods of time? (I have already set this up and the basic tests are working, but I’m not sure of nomad’s resource requirements over longer periods of time).
- For a setup that is expected to auto-scale between 5-10 cloud servers, how much RAM and disk would the single nomad server consume?
- What are the typical failure reasons for a nomad server?
- When the single nomad server fails, do existing jobs/services continue to run?
- Is there any way to restart the single nomad server such that it re-builds the current state of the cluster by querying all the clients?
- Can a nomad client work on a 2vCPU/2GB/20GB machine alongside a docker daemon and the actual jobs/services? How much memory, CPU, and disk does the nomad client consume?
- What are the typical failure reasons for a nomad client?
- When a nomad client fails, do the jobs/services running on that machine continue to run?