Hello,
We are currently evaluating Nomad for a consolidation project and have encountered an issue we can’t come up with a clean solution for it. For reference we are using standalone Nomad and are not considering using Consul at this time as it would make the project more complicated. We are making use of the Nomad service registry for discovery and dynamic configuration using config templates in the job files.
We are going to be using HAproxy to ingress traffic into the cluster which is a hard requirement as it is what we currently use and are familiar with as well as we need its performance compared to something like Traefik. It has been set up as a system service using docker host networking with static ports and we are using a template to configure it. Here is a cut down job file to show how we are running it:
job "haproxy" {
region = "global"
datacenters = ["dc1"]
type = "system"
group "haproxy" {
network {
port "http" { static = 80 }
port "https" { static = 443 }
}
task "haproxy" {
driver = "docker"
config {
image = "haproxytech/haproxy-alpine:2.7"
network_mode = "host"
volumes = [ "local/haproxy:/usr/local/etc/haproxy" ]
}
template {
change_mode = "signal"
change_signal = "SIGHUP"
destination = "local/haproxy/haproxy.cfg"
data = <<EOH
defaults
mode http
timeout client 10s
timeout connect 5s
timeout server 10s
timeout http-request 10s
frontend http
bind :80
use_backend api if { hdr(host) -i api.example.com }
backend api
{{- range nomadService "api" }}
server {{ .Address }}:{{ .Port }} {{ .Address }}:{{ .Port }} check
{{- end }}
EOH
}
}
}
}
This works really well, when the api service adds or removes tasks the template updates and HAproxy gets a SIGHUP telling it to hot reload and no client connections are lost (we have shutdown_delay set in our other jobs to give haproxy time to update).
This is great until we want to make a change to the config such as add another frontend+backend or change some config. For example in the above file I may want to handle another host header so I add the following to the existing frontend config:
use_backend api if { hdr(host) -i api.example.com }
use_backend api if { hdr(host) -i api.somewhereelse.com }
When I submit the job file with the change Nomad sees it and restarts all my ingress LBs at the same time, causing a short outage. On our existing LB servers we can manually reload haproxy so no connections are lost but in Nomad we don’t seem to have that choice, it seems to always restart the tasks whenever the job spec changes, and I cant find any guides on how we could resolve this.
Is there a way to prevent or work around this issue? I have looked into writing the perfect template that will never need us to restart the HAproxy allocations ever, using the services to build the frontends and backends and keep config in tags and variables, but it gets complicated and I expect that as soon as it goes into production we will find issues that need correcting or config that needs adding, and we can’t afford to lose ingress even for the small amount of time it takes to restart the containers.