Hi there. I have a nomad job run
that uploads 1 job with 5 tasks in it to a nomad agent -dev
.
Sometimes I see logs like the following:
==> 2021-10-08T20:06:04Z: Monitoring evaluation "0e69db14-1c06-6e6b-cdd2-4553e9709dab"
2021-10-08T20:06:04Z: Evaluation triggered by job "grapl-local-infra"
==> 2021-10-08T20:06:05Z: Monitoring evaluation "0e69db14-1c06-6e6b-cdd2-4553e9709dab"
2021-10-08T20:06:05Z: Evaluation within deployment: "4e8448b1-a9c6-bb85-0624-aeb80ed3b2da"
2021-10-08T20:06:05Z: Evaluation status changed: "pending" -> "complete"
==> 2021-10-08T20:06:05Z: Evaluation "0e69db14-1c06-6e6b-cdd2-4553e9709dab" finished with status "complete" but failed to place all allocations:
2021-10-08T20:06:05Z: Task Group "localstack" (failed to place 1 allocation):
* No nodes were eligible for evaluation
* No nodes are available in datacenter "dc1"
2021-10-08T20:06:05Z: Task Group "ratel" (failed to place 1 allocation):
* No nodes were eligible for evaluation
* No nodes are available in datacenter "dc1"
2021-10-08T20:06:05Z: Task Group "kafka" (failed to place 1 allocation):
* No nodes were eligible for evaluation
* No nodes are available in datacenter "dc1"
2021-10-08T20:06:05Z: Task Group "zookeeper" (failed to place 1 allocation):
* No nodes were eligible for evaluation
* No nodes are available in datacenter "dc1"
2021-10-08T20:06:05Z: Task Group "redis" (failed to place 1 allocation):
* No nodes were eligible for evaluation
* No nodes are available in datacenter "dc1"
2021-10-08T20:06:05Z: Evaluation "5b157950-dd6a-c9e4-44af-57f25edaa4a2" waiting for additional capacity to place remainder
==> 2021-10-08T20:06:05Z: Monitoring deployment "4e8448b1-a9c6-bb85-0624-aeb80ed3b2da"
2021-10-08T20:06:40Z
ID = 4e8448b1-a9c6-bb85-0624-aeb80ed3b2da
Job ID = grapl-local-infra
Job Version = 0
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
kafka 1 1 1 0 2021-10-08T20:16:38Z
localstack 1 1 1 0 2021-10-08T20:16:39Z
ratel 1 1 1 0 2021-10-08T20:16:18Z
redis 1 1 1 0 2021-10-08T20:16:18Z
zookeeper 1 1 1 0 2021-10-08T20:16:38Z
Allocations
ID Eval ID Node ID Node Name Task Group Version Desired Status Created Modified
32ee55c1-80cd-b299-6757-e78432ca94e1 5b157950-dd6a-c9e4-44af-57f25edaa4a2 75168830-24d1-6da6-7652-bb6606aea354 ip-10-0-5-206.ec2.internal redis 0 run running 2021-10-08T20:06:04Z 2021-10-08T20:06:18Z
7b674228-17be-932f-f999-afb881b8ccb7 5b157950-dd6a-c9e4-44af-57f25edaa4a2 75168830-24d1-6da6-7652-bb6606aea354 ip-10-0-5-206.ec2.internal zookeeper 0 run running 2021-10-08T20:06:04Z 2021-10-08T20:06:38Z
88077d7b-2dd1-d17c-f795-ac093893c381 5b157950-dd6a-c9e4-44af-57f25edaa4a2 75168830-24d1-6da6-7652-bb6606aea354 ip-10-0-5-206.ec2.internal ratel 0 run running 2021-10-08T20:06:04Z 2021-10-08T20:06:18Z
b0c9cfad-c2aa-673a-41fe-019f24ae329b 5b157950-dd6a-c9e4-44af-57f25edaa4a2 75168830-24d1-6da6-7652-bb6606aea354 ip-10-0-5-206.ec2.internal kafka 0 run running 2021-10-08T20:06:04Z 2021-10-08T20:06:38Z
e5654b8c-1fcc-8842-1598-e3c8a19645c0 5b157950-dd6a-c9e4-44af-57f25edaa4a2 75168830-24d1-6da6-7652-bb6606aea354 ip-10-0-5-206.ec2.internal localstack 0 run running 2021-10-08T20:06:04Z 2021-10-08T20:06:39Z
<and then an exit code 2>
So, my takeaway from these logs are:
- The first evaluation 0e69db14 had problems
- A guess: Nomad automatically rescheduled another evaluation 5b157950, which succeeded
And then, after this successful retry, it still gives me an exit code 2 because, per the documentation:
On successful job submission and scheduling, exit code 0 will be returned. If there are job placement issues encountered (unsatisfiable constraints, resource exhaustion, etc), then the exit code will be 2. Any other errors, including client connection issues or internal errors, are indicated by exit code 1.
So, 2 questions here:
- Is my read of the situation correct? I’m getting an exit code 2 despite having seemingly-successfully-running tasks?
- I suspect my logic that waits for my
nomad agent -dev &
to be ready may be broken - it is currently the following:
timeout 120 bash -c -- 'while [[ -z $(nomad status 2>&1 | grep running) ]]; do printf "Waiting for nomad-agent\n";sleep 1;done'
perhaps a check for ready
in nomad node status
might be preferable?