Placement failures - how do I debug it?

We have a nomad job that is supposed to run 30 tasks on 30 different nodes. We use terraform to provision 30 ec2 instances specifically for this job. Here are the constraints:
// top level
“Constraints”: [
{
“LTarget”: “”,
“Operand”: “distinct_hosts”,
“RTarget”: “true”
}
],

“TaskGroups”: [
{
“Constraints”: [
{
“LTarget”: “${meta.ResourceId}”,
“Operand”: “==”,
“RTarget”: “our-cluster-tag”
}
],
“Count”: 30,

}
]
When we run the job, it shows the following message:
main 2 unplaced
Constraint distinct_hosts filtered 28 nodes
Constraint {meta.ResourceId} == our-cluster-tag filtered nodes

I checked aws console and for sure we have 30 ec2 instances running. From nomad documentation, I can only find how to check logs for each allocation (in this case I only have 28 allocations and they are running fine).

So the question is: how do I debug placement failure issue with nomad?

Having constraints at both the job level and the task level is probably not what you want here. From the docs:

Placing constraints at both the job level and at the group level is redundant since constraints are applied hierarchically. The job constraints will affect all groups (and tasks) in the job.

The output of nomad eval status (ref https://nomadproject.io/docs/commands/eval-status/) should provide you with more detail as to what happened with that particular eval. It probably couldn’t hurt to double-check that all the tags you expected were actually applied to the EC2 instances and that Nomad can read the tags with your IAM roles.

1 Like

Thanks for the reply. It turned out that 2 of our hosts failed the init script due to a race condition edge case. We fixed that and the placement went up to 30.