Updating a Nomad job template for running tasks without restarting the containers

glynnbailey · January 30, 2023, 4:06pm

Hello,

We are currently evaluating Nomad for a consolidation project and have encountered an issue we can’t come up with a clean solution for it. For reference we are using standalone Nomad and are not considering using Consul at this time as it would make the project more complicated. We are making use of the Nomad service registry for discovery and dynamic configuration using config templates in the job files.

We are going to be using HAproxy to ingress traffic into the cluster which is a hard requirement as it is what we currently use and are familiar with as well as we need its performance compared to something like Traefik. It has been set up as a system service using docker host networking with static ports and we are using a template to configure it. Here is a cut down job file to show how we are running it:

job "haproxy" {
  region = "global"
  datacenters = ["dc1"]
  type  = "system"

  group "haproxy" {
    network {
      port "http" { static = 80 }
      port "https" { static = 443 }
    }

    task "haproxy" {
      driver = "docker"
      config {
        image = "haproxytech/haproxy-alpine:2.7"
        network_mode = "host"
        volumes = [ "local/haproxy:/usr/local/etc/haproxy" ]
      }

      template {
        change_mode = "signal"
        change_signal = "SIGHUP"
        destination = "local/haproxy/haproxy.cfg"
        data = <<EOH
defaults
  mode http
  timeout client  10s
  timeout connect 5s
  timeout server  10s
  timeout http-request 10s

frontend http
  bind :80
  use_backend api if { hdr(host) -i api.example.com }

backend api
{{- range nomadService "api" }}
  server {{ .Address }}:{{ .Port }} {{ .Address }}:{{ .Port }} check
{{- end }}
EOH
      }
    }
  }
}

This works really well, when the api service adds or removes tasks the template updates and HAproxy gets a SIGHUP telling it to hot reload and no client connections are lost (we have shutdown_delay set in our other jobs to give haproxy time to update).

This is great until we want to make a change to the config such as add another frontend+backend or change some config. For example in the above file I may want to handle another host header so I add the following to the existing frontend config:

  use_backend api if { hdr(host) -i api.example.com }
  use_backend api if { hdr(host) -i api.somewhereelse.com }

When I submit the job file with the change Nomad sees it and restarts all my ingress LBs at the same time, causing a short outage. On our existing LB servers we can manually reload haproxy so no connections are lost but in Nomad we don’t seem to have that choice, it seems to always restart the tasks whenever the job spec changes, and I cant find any guides on how we could resolve this.

Is there a way to prevent or work around this issue? I have looked into writing the perfect template that will never need us to restart the HAproxy allocations ever, using the services to build the frontends and backends and keep config in tags and variables, but it gets complicated and I expect that as soon as it goes into production we will find issues that need correcting or config that needs adding, and we can’t afford to lose ingress even for the small amount of time it takes to restart the containers.

jrasell · February 3, 2023, 3:00pm

Hi @glynnbailey,

Unfortunately, changing the template block within the job specification is classed as a destructive update and therefore will result in a replacement of running allocations of that job version, as you’ve seen.

When I submit the job file with the change Nomad sees it and restarts all my ingress LBs at the same time, causing a short outage.

There are some job spec parameters which can prevent this and make the rollout of new versions easier and without impact on availability. The update block in particular can control how an update is rolled out across all the running allocations. kill_timeout and kill_signal can be used to better control how Nomad stops allocations for replacement. The timeout in particular can ensure HAProxy has the time to gracefully close connections and shutdown, rather than be prematurely killed.

I hope this helps. Please let me know if you have any followup questions.

Thanks,
jrasell and the Nomad team

glynnbailey · February 23, 2023, 1:20pm

Thanks for the response. I have found a solution and I thought I would put it here incase anyone has the same issue. I did not realise that consul-template also supported the nomad service discovery.

github.com

hashicorp/consul-template/blob/main/docs/configuration.md#nomad

# Configuring Consul Template

Consul Template can be configured with command line flags and configuration
files. The CLI interface supports most options in the configuration file and
vice-versa. We suggest keeping the usage of CLI flags brief for readability
purposes, and using a configuration file for more complex and longer
configuration.

- [Command Line Flags](#command-line-flags)
- [Configuration File](#configuration-file)
- [Configuration Options](#configuration-options)
  - [Consul Template](#consul-template)
  - [Consul](#consul)
  - [Vault](#vault)
  - [Nomad](#nomad)
  - [Templates](#templates)
  - [Consul Template Modes](#modes)
    - [Once Mode](#once-mode)
    - [De-Duplication Mode](#de-duplication-mode)
    - [Exec Mode](#exec-mode)

This file has been truncated. show original

We can setup haproxy on a server outside the nomad cluster and use consul-template to build the config and SIGHUP haproxy. If we need to change the template we can do so without needing to submit a nomad job so the problem of the haproxy containers restarting goes away. In my testing it works really well, really impressed with it.

Thanks again.

colinmollenhour · November 11, 2023, 2:51am

@glynnbailey Thanks you so much for your post! This is exactly what I needed and it never occurred to me that consul-template could be used directly with Nomad!

Kamilcuk · November 13, 2023, 3:57pm

Hi, so the described solution above states to run nginx outside of Nomad. This is no ideal. Instead, consider the following.

The solution we use is quite similar. You run nginx in Nomad and generate configuration using Nomad variables:

 data = "{{with nomadVar \"nomad/jobs/haproxy\"}}{{.config}}{{end}}"

Then you refresh the configuration by using an external process, just upload the configuration to Nomad variable:

consul-template ....the.template... |
      jq -Rs '{config: ., docs: "This was generated today by something - additional documentation"}' |
      nomad var put -in=json nomad/jobs/haproxy

That way:

nginx is running in Nomad
Nomad detects that Nomad variable has changed and regenerates the template and sends SIGHUP
you have up-to-date configuration in Nomad

glynnbailey · November 13, 2023, 4:33pm

It’s been a while since I have tested it and I was using HAProxy rather than nginx, but that was not the experience I had. Whenever there is a change in a nomad variable that caused a template inside a job to be regenerated nomad would update the file then completely restart the containers to apply it, I couldn’t find a way to change it to just HUP the process inside the container so it reloaded the new config. Each time there was a change there was effectively a small outage while the containers restarted which wasn’t acceptable. Doing it with consul-template let me do trigger a HUP since you can specify what command to run on a change, i did it like this:

nomad {
    address = "https://nomad.example.com:4646/"
    namespace = "default"
}

vault {
    renew_token = false
}

template {
    source = "./haproxy.tmpl"
    destination = "/etc/haproxy/haproxy.cfg"

    exec {
        command = ["systemctl", "reload", "haproxy.service"]
        timeout = "30s"
    }

With a template of something like:

backend api
{{- range nomadService "api" }}
    server {{ .Address }}:{{ .Port }} {{ .Address }}:{{ .Port }} check
{{- end }}

Has there been a change to the behaviour? I was using the variables from the service registry rather than a variable, but I think it would do the same thing since it’s a similar sort of template reload? It’s pretty much the same template that I used in consul-template, you can see it in my original post.

If you use consul+nomad with haproxy then there is a solution, haproxy supports loading config from DNS which consul can provide and the docs recommend this solution but it doesnt work with standalone nomad as it doesn’t have DNS. At the time we didn’t particularly want to maintain a consul cluster on top of nomad when we could just use consul-template on our existing haproxy LBs.

I am happy to see this getting discussed some more as I didn’t ever find a satisfying solution that let us keep everything together inside nomad, it was the only real issue I came across with nomad.

colinmollenhour · November 15, 2023, 1:45pm

Perhaps you could add some “placeholder” variables in your HAProxy template and then store the placeholder values in a Nomad variable so to inject your updates you would update the Nomad variables rather than the job spec.

glynnbailey · November 17, 2023, 11:54am

I would need to do some more testing to config but from what I recall any time the haproxy config changed due to any of the templating nomad would restart the job to apply the change and there was no way to prevent this, so I don’t think there would be any difference between using a variable and using details from the service registry.

consul-template has a feature to specify a command to run which allows for the workaround. I could try putting consul-template into my job and have it update the haproxy config rather than nomad but it seems a bit hacky.

shantanugadgil · November 26, 2023, 7:14am

Although a system job is the right way to do it, can a service job for the HAProxy suffice with a “fixed port” network section.

This will introduce noise during updating the service, but will allow rolling upgrade semantics of service (compared to system).

@jrasell thoughts?

Topic		Replies	Views
Nomad 1.4 and HaProxy server-template without consul and its DNS feature? Nomad consul	4	1057	October 14, 2022
Updating task template without triggering a restart Nomad	4	865	February 8, 2022
Forcing a template rerender when the source file changes Nomad consul-template	1	3280	December 15, 2020
How to update a job without creating new allocations Nomad	5	422	June 19, 2022
Dynamic routing based on URL Consul dns , connect , consul-nomad	5	951	May 16, 2022

Updating a Nomad job template for running tasks without restarting the containers

Related topics