I've run out of viable methods to handle service configuration files in Nomad

Hello all,

let me start with a bit of context: I’m currently running my homelab with docker-compose and a ball of ugly bash scripting.
I’ve decided to switch to Nomad, to bring a bit more sanity to the handling of services.

After three days of tinkering, I’m now at my wit’s end - I’ve run fresh out of ideas on how to manage the config files for my services.
Right now, they all reside on a CephFS share mounted on all of my machines are directly mounted into their containers. Sometimes as single config files (e.g. Grafana) and sometimes as entire directories (e.g. Nginx). The entire config share is also a Git repository for proper versioning of the configs.

One problem of this setup: Config files with secrets in them. This was what originally drew me to Nomad, the ability to template config files and inject secrets from a secure place (Vault, in my case), so that I could safely store the templates in the configs git repo.

But I can’t figure out how to get config files and templates from my config repo into the Nomad jobs.

I’d like the following abilities:

  1. Be able to store nomad job files and config files and config file templates in the same repo, without any secrets in them
  2. Be able to iterate on config files without having to commit them to the repo, e.g. while working on configs for new services

I have already considered the following ways, all inadequate:

Using HCL2 file() function
My first thought was to make use of HCL2’s file function to initialize the “data” property of my job’s emplate stanzas. This would also have the added advantage that I didn’t need to maintain a share for my config files to share them between my dev machine and my servers.

This has two problems: The first, more serious one: This approach doesn’t support entire directories. Any services with split config files (like e.g. Nginx) would require a large number of “template” stanzas.
The second problem: I wrote a bit of tooling around the Nomad API to introduce the ability to bring down/up multiple services at once and introduce dependencies (I just don’t like the “let the containers fail to start until their deps are up” approach very much when I know exactly what those dependencies are). The problem: I had been using the Nomad API to parse the job from HCL to JSON, which isn’t possible with HCL2.

Using the artifact stanza to load files from configs share
My next thought was to use the artifact stanza. As I said above, one of the things I would like to do is to easily iterate over config files on my dev machine, without having to make a push/commit or upload to some S3 bucket.
So my tought was: Keep the current CephFS mount with the configs and load what I need from there via the artifacts stanza.
I found very quickly that for security reasons, go-getter’s “file://” mode is not available.
See e.g. this issue: Artifacts don't support "file" scheme · Issue #1897 · hashicorp/nomad · GitHub

Using Volume mounts
The next approach I tried were volume mounts. As written above, I’ve already got my configs on a CephFS share and after some work, I got Ceph(FS) CSI going with Nomad and happily mounted a share with my configs.

The problem with this approach: Volumes never appear in the allocation during job startup. They can’t be used as the source of “template” stanza templates.
I have also tried the same with Nomad’s host_volumes, but they have the same behavior. They mount to the started container just fine, but aren’t available when the template stanza is run.
See e.g. this issue: Host volumes are mouted on top of templates, artifacts and dispatch payloads · Issue #7796 · hashicorp/nomad · GitHub

Setting up a local HTTP server and using the artifact stanza
The next solution that was proposed in some of the issues I read: To set up a local HTTP server on each of my Nomad nodes, serving my config file CephFS share and then using it in the artifacts stanza.

This also failed. While single file downloads worked, directory downloads did not. But that might actually have been misconfiguration of the webserver on my side, not a problem with Nomad.
I might try to work on this again later.

Using an S3 bucket with the artifacts stanza
Spurred on by my almost success with the previous HTTP solution, I checked whether there might be some way to mount my Ceph cluster’s RadosGW S3 onto my dev machine to sync the config files while I work on them. Then, I would have had the ability to try config file change immediately without having to do a manual S3 push or a git commit.
I discovered s3fs-fuse, which allows me to mount a S3 bucket onto my dev machine, and to put the config file repo in there.
And this works! Having everything mounted locally, changes in config files (and entire directories!) are immediately visible when also using the mounted S3 bucket via the artifact stanza.

But there is a problem. There is literally no way right now to not have the S3 credentials in the Nomad job file. :frowning:
See this ticket: [Feature] vault secrets in artifact stanza · Issue #3854 · hashicorp/nomad · GitHub

But this setup seemed almost workable. I thought to solve the problem by adding the S3 credentials in the standard AWS env variables for my Nomad client processes. This also did not work, for once I remove the credentials from the artifact stanza, go-getter seems to force the use of the Amazon URL. And I’m hosting my own S3.
See this ticket: S3 endpoint ignored when not using query parameters · Issue #329 · hashicorp/go-getter · GitHub

The actual question
How does everyone handle their service config files for Nomad jobs?

Are the requirements I have mentioned above attainable with Nomad?
Please note: A simple no is a perfectly nice answer. I understand that a guy with his simple hobby cluster isn’t exactly among Nomad’s use cases. :slight_smile: But at least knowing that it’s not possible will stop me from doing anymore bahsing my head against this particular wall. :sweat_smile:

1 Like

I have my jobfiles and config-files all in one repo and currently load secrets from consul-k/v (vault should be the better solution, but didnt implement it yet). The deployment happens via levant.

My structure looks like this for each service:
grafik

In my jobfiles I download the config-folder from gitlab via artifact-stanza like this:

  artifact {
    source = "git::ssh://git@xy.test.foo/group/project//[[.service.name]]//config"
    destination = "local"
    options {
      sshkey = "%key%"
    }
  }

For downloading single files I set “mode = “file”” in the artifact-stanza, but im only using it for downloading files from a registry via https so far, so not sure if it works with git as well.

I also use the template-stanza to fill some non-secret stuff from consul-k/v as well:

  template {
    source = "local/service-example-config.tpl"
    destination = "local/config/service-example-config.yml"
    # Specifies the behavior Nomad should take if the rendered template changes. Nomad will always write the new contents of the template to the specified destination.
    change_mode = "noop"
  }

Here you can also set the change_mode to restart, so if the template gets re-rendered e.g. when a consul-k/v-key changes, the allocations get restarted automatically.

Hope this helps! Feel free to ask any questions.

Hi @Uweee,

thanks a lot for your comment. Sadly, it’s not (exactly) what I was looking for, because it requires a git commit for each config change - which was one of the things I wanted to avoid. :slight_smile:

I’ve now come up with the following solution:

  • Store my config files (both, the Nomad job files and the application configs) in a S3 bucket on my local Ceph cluster
  • Mount that S3 bucket with S3FS on my developer machine
  • Use the artifact stanza’s S3 functionality to download the right config subdirectories from that same S3 bucket in the job files
  • I’m getting around defining the S3 credentials in the job files with the help of the tip in this go-getter issue. In short, I’m setting the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables in the Nomad client’s environment (using a systemd environment file). In addition, I’m setting AWS_METADATA_URL=dummy. Without that last variable, non-aws S3 URLs are always replaced with AWS URLs

This approach fullfills all my requirements for now.

But there is one big remaining problem: Config file updates. If the only thing which changes is one of the application config files, there is no proper way to restart a job so the new configuration is taken into account.
Because the files are downloaded during allocation creation, and then never checked again, neither changes in template files nor changes in static config files mean anything at all to Nomad.

If I change a config file or config file template and then issue nomad run /path/to/jobfile.hcl, nothing whatsoever happens - as the job file itself hasn’t changed, Nomad doesn’t see any reason to do anything. And as far as I could see, there isn’t any option anywhere that simply says: I don’t care what you think, Nomad. Completely restart all allocations of this job, including all of the prep work. Right now, please.

Initially, my solution was to always change something small in the job file alongside config file changes. My preferred approach was to just increase the memory limit in the resources stanza of one of my tasks by one Megabyte, which would induce Nomad to restart and download the newer config file.

Quick Edit: Forgot to mention: I discovered the nomad alloc stop command yesterday. This command actually does (part of!) what I need: It restarts the given allocation, including artifact dowloads and templating. But of course, it only works for a single allocation, so it’s a lot more work if you change e.g. the configs of a system job. You would first have to find all allocations of the job, and then “stop” them one by one.

I’m getting stronger and stronger vibes that I’m holding this wrong.