Hello,
I have job similar to following:
job "test {
group "main" {
count = 2
update {
max_parallel = 1
}
service {
name = "srv"
check { ... }
}
task "a" {
driver = "docker"
...
template {
env = true
data = "X={{ key "test/x" }}"
}
}
}
}
Task “a” takes some significant time to startup (for example 10 mins). It isn’t an issue during deployment because instance “a1” handles requests until “a2” starts and passes checks and vice versa. But if the value of config value “test/x” changed or the state of “srv” check became unhealthy for both task allocations (soft fail when restart recommended but not strictly required) I get 10 minutes of downtime until both services restarted.
I’m looking for ways to avoid parallel restarts (similar to what happening during deploy). It is easy without nomad (like using restart command consul lock restart-lock restart-service-and-wait-healthy
) but I am unable to find a simple way to do it with nomad.
One possible solution that I can imagine is to change the application stop process, handle SIGTERM, check if instance “a2” restarting and if it is - block stop for 10 mins (+ configure kill_timeout for 10mins). But this solution looks complex and error prone. Are there easier ways to achieve the same result?