Nomad cluster for smart home

mmeier86 · January 7, 2022, 5:27pm

with at least a few of these I might be able to help. A bit of quick context: I’m also a Nomad newcomer, and I’m also currently migrating a docker-compose based setup to Nomad.

It really doesn’t.
One point to look out for when coming from docker-compose: At least as far as I’m aware, Nomad doesn’t have any equivalents to docker-compose’ up and down commands. While Nomad can of course stop and start entire jobs, there is no concept of bringing down all the jobs you’ve currently got running, or to bring up all the jobs in e.g. a certain directory. So you may need a bit of scripting if you need that.

One other thing missing from Nomad compared to docker-compose is service dependencies, in particular: Startup/shutdown dependencies similar to docker-compose’ depends_on config option. Instead the approach is to simply restart a container until it stays up.

I think this may be problematic, from a “getting the device showing up in your docker container” point of view. But you said that you’ve already got it all running with docker-compose? If so, this might not actually be a problem?
The second thing which may seem problematic is how to tell jobs which need the Zigbee stick on which nodes they are supposed to run: That’s not a problem, Nomad supports setting arbitrary metadata in the Nomad client config which can be used as a constraint on the job to decide where it’s allowed to run. See the client config.

This is doable in a number of ways. The simplest way is host volumes. These are just directories on the host which are configured as potential mounts in the Nomad client config and then used by any Nomad job. These can for example be NFs shares mounted on your Hosts.
You can also go a lot more fancy, as Nomad also supports the Container Storage Interface (CSI) specification, allowing use of Ceph and the like. At least for Ceph, I can confirm that the official Ceph CSI plugin works with Nomad, and it’s stable. But the configuration and handling of volumes is not at all what I expected, but that might be on the Ceph CSI plugin, not on Nomad.

GitOps driven should be possible. One of the idiomatic ways to get configuration files into Nomad jobs is via the artifact stanza, which also supports checking out a git repository and making its contents available to the job’s tasks.
Very important point: You cannot simply access random paths on the hosts running your Nomad clients. By default, you only have access to the allocation directory and what you can download via the aforementioned artifact stanza. This is currently my biggest headache. For more details, have a look at my post on the matter here.

Short note: You only need one server of each, not three, unless you really want HA mode.
For a setup like this, I also wouldn’t “sacrifice” three dedicated machines just for the servers. I don’t think they produce that much load. While running Nomad Servers and Clients on the same machine is advised against and probably needs some cautious configs concerning port usage, why not run VM’s on the machines and then co-locate Server and Client VMs? Dedicating three physical machines to the servers seems like a waste.

I’ve got Grafana, Loki, Fluentd and Prometheus running on my old docker-compose stack and plan to migrate them to Nomad. I don’t see a problem with that.
On the logging, I can already comment as I just finished my setup for that.

It looks as follows:

On each Nomad client, configure the Docker task driver in the client config with the Fluentd logging driver. See the config options here. This configures all Docker containers started by Nomad to use the Fluentd driver.
Set up a fluentbit job as a system job running on each node and listening on the port configured for the logging driver in the previous point.
Set up a local host_volume on each Nomad node, which gets mounted into the fluentbit job and all other jobs which need to write into log files, instead of writing to stdout
In my setup, the fluentbit doesn’t do much more than collecting the jobs and sending it to a fluentd instance which does the heavy lifting of parsing the log lines
Depending on taste, you can now either forward the logs to Loki directly from the fluentbit jobs, or have a central fluentd job as an aggregator and log sorter.

Pro Tip: Manually set the local docker driver for the fluentbit job - otherwise, you may end up in a reinforcing loop of recurring log lines which will blow up your Docker deamon’s memory. Ask me how I know

I’ve personally always felt uncomfortable with committing secrets into git - even encrypted. And you wrote previously that you already plan to set up Vault, so why not use that? I’ve got Vault running, and while it takes some setting up with tokens, logins auths and so forth, it does really pay off. The integration between Nomad and Vault is really good.

One point on storing secrets in your repo encrypted: You will have to think about decrypting them at some point. You’ve got several ways to solve that, depending on how you plan to handle your config files.
One potential way, if you want to do the decrypting with/inside Nomad jobs: The lifecycle stanza allows you to run tasks (containers) as “init tasks”. Those can mount a shared directory, download your config repo, decrypt the secrets on the shared mount, and then exit. Afterwards, the actual application can access the decrypted secrets.

As far as I understand (but perhaps something the Nomad team is better suited to comment on?) Nomad Packs is currently still considered a tech demo.

I’m currently in the process of setting up Traefik, where in my current production setup I’m running Nginx. Right now, it works well. I’ve got it running with a Let’s encrypt cert and using the Consul catalog provider. This means I don’t have static configs for sites anymore. Instead, my Nomad jobs register themselves with Consul, and Traefik in turn periodically contacts Consul for a list of services. And because Consul only lists healthy services, this ought to also cover automatic fallbacks, but I haven’t tried that yet.
Caution: Setting up Traefik is my current project - so I might simply not have hit the real roadblocks yet

I’d use Grafana as a prometheus frontend for visualization. The prometheus+grafana setup has served me well for both, host metrics and service metrics gathering. One nice point about using Prometheus: Like Traefik, it supports service discovery via Consul. So it might be able to autodetect your running apps if they all register themselves as services in Consul.
I can’t say how well this works, as I’ve not yet migrated my monitoring stack to Nomad, but at least for the base prometheus+grafana setup I don’t see any reason why it shouldn’t work.

And I think that completes my novel on Nomad

Topic		Replies	Views
Dockerized Consul & Nomad Cluster difficulties Nomad	2	432	November 8, 2024
Host volumes orchestration Nomad nomad	0	256	March 25, 2023
Migrating from docker-swarm to nomad – questions and help needed Nomad	0	1053	February 13, 2023
Terraform, VM Cluster Software, and setting up a Nomad cluster Terraform	0	928	July 5, 2020
Nomad as a mid-tier orchestrator Nomad	1	593	April 21, 2021

Nomad cluster for smart home

Related topics