Hi, I have a job stuck in limbo
ID = sidekiq
Name = sidekiq
Submit Date = 2022-06-17T21:32:21-07:00
Type = system
Priority = 50
Datacenters = main1
Namespace = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost Unknown
sidekiq 0 0 0 3 0 0 0
Allocations
ID Node ID Task Group Version Desired Status Created Modified
c9df38b5 c0314718 sidekiq 8 run failed 30m48s ago 26m47s ago
2646c3e2 c0314718 sidekiq 8 run failed 43m41s ago 37m18s ago
fded9750 c0314718 sidekiq 8 run failed 1d2h ago 43m42s ago
Unfortunately it’s not revealing much as to why it’s stopped retrying. The job file for this looks something like this
job "sidekiq" {
datacenters = ["main1"]
# One on each node that meets constraints
type = "system"
update {
max_parallel = 1
min_healthy_time = "1m"
stagger = "1m"
auto_promote = true
auto_revert = true
canary = 1
}
group "sidekiq" {
restart {
attempts = 1
delay = "1s"
mode = "delay"
interval = "5s"
}
...
}
From the config, I had assumed it would keep restarting with a delay. Can anyone please provide insight into this as well as how I can always keep having it retry if there’s a failure, indefinitely?
Thanks!