I am using Nomad cluster with 3 servers and 5 worker nodes. On each server, there is Traefik Proxy installed, which have Nomad auto discovery enabled. Everything works, except for one problem. I have set
refreshInterval to 1 second, which means that Traefik will fetch the states from Nomad every second (that is the smallest possible time unit). But the problem occurs when an job gets updated, there can be some downtime, as the Traefik will have the IP and Port of the old instance of the job, and the updated job will be placed on the new port (as it was updated, and Nomad deployed it on different port and/or IP).
A short example:
- Service A is currently running on 188.8.131.52:6666, the Traefik has discovered this service and everything works fine
- Service A is updated. Nomad shuts it down and deploys the updated version of the Service A to 184.108.40.206:7777. But Traefik still holds the old service info, and proxies traffic to 220.127.116.11:6666 - hence the downtime
- After some period (in my case, a few seconds), Traefik rediscoveres the updated service A, and updates the address to 18.104.22.168:7777 - now it works again
Does anybody else use Nomad/Traefik combination, and if you do, how did you overcome this problem?