Using load balancing with Traefik

Hey!
I am using Nomad cluster with 3 servers and 5 worker nodes. On each server, there is Traefik Proxy installed, which have Nomad auto discovery enabled. Everything works, except for one problem. I have set refreshInterval to 1 second, which means that Traefik will fetch the states from Nomad every second (that is the smallest possible time unit). But the problem occurs when an job gets updated, there can be some downtime, as the Traefik will have the IP and Port of the old instance of the job, and the updated job will be placed on the new port (as it was updated, and Nomad deployed it on different port and/or IP).

A short example:

  • Service A is currently running on 1.2.3.4:6666, the Traefik has discovered this service and everything works fine
  • Service A is updated. Nomad shuts it down and deploys the updated version of the Service A to 1.2.3.5:7777. But Traefik still holds the old service info, and proxies traffic to 1.2.3.4:6666 - hence the downtime
  • After some period (in my case, a few seconds), Traefik rediscoveres the updated service A, and updates the address to 1.2.3.5:7777 - now it works again

Does anybody else use Nomad/Traefik combination, and if you do, how did you overcome this problem?

Hi,

I’m not a nomad or traefik user but appreciate this is a common rolling update issue.

While looking through the docs, I found Rolling Updates | Nomad | HashiCorp Developer, more specifically, the attribute min_healthy_time which I believe should be long enough so that traefik has enough time to discover the new allocations while the old ones are kept running.