Hello.
One task in job failed, after a few attempts Nomad just ignores it:
$ nomad job status portal
ID = portal
Name = portal
Submit Date = 2024-07-17T17:03:32-04:00
Type = service
Priority = 50
Datacenters = XX
Namespace = default
Node Pool = default
Status = dead (stopped)
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost Unknown
wi-api 0 0 0 0 5 0 0
wi-knecht 0 0 0 0 5 0 0
wi-nginx 0 0 0 6 3 0 0
wi-webshot 0 0 0 0 5 0 0
Latest Deployment
ID = 868c0ba4
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
wi-api 1 1 1 0 2024-07-17T21:01:46Z
wi-knecht 1 1 1 0 2024-07-17T21:01:46Z
wi-webshot 1 1 1 0 2024-07-17T21:01:46Z
Allocations
ID Node ID Task Group Version Desired Status Created Modified
3996beee 6706c131 wi-webshot 7 stop complete 13m24s ago 1m27s ago
4a0c281d 6706c131 wi-knecht 7 stop complete 13m24s ago 1m27s ago
72b936db 6706c131 wi-api 7 stop complete 13m24s ago 1m27s ago
f1b26c50 6706c131 wi-nginx 5 run failed 57m39s ago 16m34s ago
37180829 6706c131 wi-nginx 5 stop failed 1h22m ago 19m18s ago
564f7092 6706c131 wi-nginx 5 stop failed 1h31m ago 20m26s ago
541cadde 6706c131 wi-nginx 5 stop failed 1h36m ago 21m28s ago
69564cdb 6706c131 wi-knecht 5 stop complete 5h20m ago 13m39s ago
7d84118c 6706c131 wi-api 5 stop complete 5h20m ago 13m39s ago
a3fba513 6706c131 wi-webshot 5 stop complete 19h28m ago 13m39s ago
kes@Eugens-MacBook-Pro nomad $ nomad deployment status 868c0ba4
ID = 868c0ba4
Job ID = portal
Job Version = 7
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
wi-api 1 1 1 0 2024-07-17T21:01:46Z
wi-knecht 1 1 1 0 2024-07-17T21:01:46Z
wi-webshot 1 1 1 0 2024-07-17T21:01:46Z
It is not clear to me how to start nginx task? it is ignored even after the stop (
After a few tries to stop/plan/run:
kes@Eugens-MacBook-Pro nomad $ nomad job plan -var="deploy_version=$PRTL_PROJECT_NAME" -var="base_domain=$PRTL_BASE_DOMAIN" derived-src/services/portal.hcl
Job: "portal"
Task Group: "wi-api" (1 in-place update)
Task: "wi-api-task"
Task Group: "wi-knecht" (1 in-place update)
Task: "wi-knecht-task"
Task Group: "wi-nginx" (1 create, 1 destroy)
Task: "wi-nginx-task"
Task Group: "wi-webshot" (1 in-place update)
Task: "wi-webshot-task"
Scheduler dry-run:
- All tasks successfully allocated.
Job Modify Index: 13728
To submit the job with version verification run:
nomad job run -check-index 13728 -var="deploy_version=nomad" -var="base_domain=XXX" derived-src/services/portal.hcl
When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
kes@Eugens-MacBook-Pro nomad $ nomad job run -check-index 13728 -var="deploy_version=nomad" -var="base_domain=XXX" derived-src/services/portal.hcl
==> 2024-07-18T10:31:04-04:00: Monitoring evaluation "88806293"
2024-07-18T10:31:04-04:00: Evaluation triggered by job "portal"
2024-07-18T10:31:04-04:00: Evaluation within deployment: "e4d8385d"
2024-07-18T10:31:04-04:00: Evaluation status changed: "pending" -> "complete"
==> 2024-07-18T10:31:04-04:00: Evaluation "88806293" finished with status "complete"
==> 2024-07-18T10:31:04-04:00: Monitoring deployment "e4d8385d"
! Deployment "e4d8385d" failed
2024-07-18T10:31:04-04:00
ID = e4d8385d
Job ID = portal
Job Version = 10
Status = failed
Description = Failed due to progress deadline
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
wi-api 1 1 1 0 2024-07-17T21:16:25Z
wi-knecht 1 1 1 0 2024-07-17T21:16:25Z
wi-nginx 1 1 0 1 2024-07-17T21:16:14Z
wi-webshot 1 1 1 0 2024-07-17T21:16:25Z
Via the WebUI we can see that nginx even have not tried to restart:
But in the plat it is marker as destroy/create already.
A couple more stops seems resolved the problem, but how to restart specific task without adventures and downtime?
kes@Eugens-MacBook-Pro nomad $ nomad job run -check-index 13728 -var="deploy_version=nomad" -var="base_domain=LL" derived-src/services/portal.hcl
==> 2024-07-18T10:34:19-04:00: Monitoring evaluation "5433140e"
2024-07-18T10:34:19-04:00: Evaluation triggered by job "portal"
2024-07-18T10:34:19-04:00: Evaluation within deployment: "e4d8385d"
2024-07-18T10:34:19-04:00: Evaluation status changed: "pending" -> "complete"
==> 2024-07-18T10:34:19-04:00: Evaluation "5433140e" finished with status "complete"
==> 2024-07-18T10:34:19-04:00: Monitoring deployment "e4d8385d"
! Deployment "e4d8385d" failed
2024-07-18T10:34:19-04:00
ID = e4d8385d
Job ID = portal
Job Version = 10
Status = failed
Description = Failed due to progress deadline
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
wi-api 1 1 1 0 2024-07-17T21:16:25Z
wi-knecht 1 1 1 0 2024-07-17T21:16:25Z
wi-nginx 1 1 0 1 2024-07-17T21:16:14Z
wi-webshot 1 1 1 0 2024-07-17T21:16:25Z
kes@Eugens-MacBook-Pro nomad $ nomad stop portal
==> 2024-07-18T10:34:36-04:00: Monitoring evaluation "542fb04e"
2024-07-18T10:34:37-04:00: Evaluation triggered by job "portal"
2024-07-18T10:34:37-04:00: Evaluation within deployment: "e4d8385d"
2024-07-18T10:34:37-04:00: Evaluation status changed: "pending" -> "complete"
==> 2024-07-18T10:34:37-04:00: Evaluation "542fb04e" finished with status "complete"
==> 2024-07-18T10:34:37-04:00: Monitoring deployment "e4d8385d"
! Deployment "e4d8385d" failed
2024-07-18T10:34:37-04:00
ID = e4d8385d
Job ID = portal
Job Version = 10
Status = failed
Description = Failed due to progress deadline
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
wi-api 1 1 1 0 2024-07-17T21:16:25Z
wi-knecht 1 1 1 0 2024-07-17T21:16:25Z
wi-nginx 1 1 0 1 2024-07-17T21:16:14Z
wi-webshot 1 1 1 0 2024-07-17T21:16:25Z
kes@Eugens-MacBook-Pro nomad $ nomad stop portal
==> 2024-07-18T10:34:41-04:00: Monitoring evaluation "de5233dc"
2024-07-18T10:34:42-04:00: Evaluation triggered by job "portal"
2024-07-18T10:34:42-04:00: Evaluation within deployment: "e4d8385d"
2024-07-18T10:34:42-04:00: Evaluation status changed: "pending" -> "complete"
==> 2024-07-18T10:34:42-04:00: Evaluation "de5233dc" finished with status "complete"
==> 2024-07-18T10:34:42-04:00: Monitoring deployment "e4d8385d"
! Deployment "e4d8385d" failed
2024-07-18T10:34:42-04:00
ID = e4d8385d
Job ID = portal
Job Version = 10
Status = failed
Description = Failed due to progress deadline
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
wi-api 1 1 1 0 2024-07-17T21:16:25Z
wi-knecht 1 1 1 0 2024-07-17T21:16:25Z
wi-nginx 1 1 0 1 2024-07-17T21:16:14Z
wi-webshot 1 1 1 0 2024-07-17T21:16:25Z
kes@Eugens-MacBook-Pro nomad $ nomad job plan -var="deploy_version=$PRTL_PROJECT_NAME" -var="base_domain=$PRTL_BASE_DOMAIN" derived-src/services/portal.hcl
+/- Job: "portal"
+/- Stop: "true" => "false"
Task Group: "wi-api" (1 create)
Task: "wi-api-task"
Task Group: "wi-knecht" (1 create)
Task: "wi-knecht-task"
Task Group: "wi-nginx" (1 create)
Task: "wi-nginx-task"
Task Group: "wi-webshot" (1 create)
Task: "wi-webshot-task"
Scheduler dry-run:
- All tasks successfully allocated.
Job Modify Index: 18648
To submit the job with version verification run:
nomad job run -check-index 18648 -var="deploy_version=nomad" -var="base_domain=LLL" derived-src/services/portal.hcl
When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
kes@Eugens-MacBook-Pro nomad $ nomad job run -check-index 18648 -var="deploy_version=nomad" -var="base_domain=LLL" derived-src/services/portal.hcl
==> 2024-07-18T10:37:33-04:00: Monitoring evaluation "250cae9e"
2024-07-18T10:37:34-04:00: Evaluation triggered by job "portal"
2024-07-18T10:37:34-04:00: Evaluation within deployment: "8b7f3a55"
2024-07-18T10:37:34-04:00: Allocation "318bb349" created: node "6706c131", group "wi-webshot"
2024-07-18T10:37:34-04:00: Allocation "5076f44c" created: node "6706c131", group "wi-knecht"
2024-07-18T10:37:34-04:00: Allocation "b54e1af6" created: node "6706c131", group "wi-nginx"
2024-07-18T10:37:34-04:00: Allocation "c5cd5053" created: node "6706c131", group "wi-api"
2024-07-18T10:37:34-04:00: Evaluation status changed: "pending" -> "complete"
==> 2024-07-18T10:37:34-04:00: Evaluation "250cae9e" finished with status "complete"
==> 2024-07-18T10:37:34-04:00: Monitoring deployment "8b7f3a55"
⠋ Deployment "8b7f3a55" in progress...
2024-07-18T10:37:41-04:00
ID = 8b7f3a55
Job ID = portal
Job Version = 13
Status = running
Description = Deployment is running
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
wi-api 1 1 0 0 2024-07-18T14:47:34Z
wi-knecht 1 1 0 0 2024-07-18T14:47:34Z
wi-nginx 1 1 0 1 2024-07-18T14:47:34Z
wi-webshot 1 1 0 0 2024-07-18T14:47:34Z