Placement failures - how do I debug it?

wfeng-fsde · February 19, 2020, 11:29pm

We have a nomad job that is supposed to run 30 tasks on 30 different nodes. We use terraform to provision 30 ec2 instances specifically for this job. Here are the constraints:
// top level
“Constraints”: [
{
“LTarget”: “”,
“Operand”: “distinct_hosts”,
“RTarget”: “true”
}
],
…
“TaskGroups”: [
{
“Constraints”: [
{
“LTarget”: “${meta.ResourceId}”,
“Operand”: “==”,
“RTarget”: “our-cluster-tag”
}
],
“Count”: 30,
…
}
]
When we run the job, it shows the following message:
main 2 unplaced
Constraint distinct_hosts filtered 28 nodes
Constraint {meta.ResourceId} == our-cluster-tag filtered nodes

I checked aws console and for sure we have 30 ec2 instances running. From nomad documentation, I can only find how to check logs for each allocation (in this case I only have 28 allocations and they are running fine).

So the question is: how do I debug placement failure issue with nomad?

tgross · February 21, 2020, 8:32pm

Having constraints at both the job level and the task level is probably not what you want here. From the docs:

Placing constraints at both the job level and at the group level is redundant since constraints are applied hierarchically. The job constraints will affect all groups (and tasks) in the job.

The output of nomad eval status (ref Commands: eval status | Nomad | HashiCorp Developer) should provide you with more detail as to what happened with that particular eval. It probably couldn’t hurt to double-check that all the tags you expected were actually applied to the EC2 instances and that Nomad can read the tags with your IAM roles.

wfeng-fsde · February 22, 2020, 3:10am

Thanks for the reply. It turned out that 2 of our hosts failed the init script due to a race condition edge case. We fixed that and the placement went up to 30.

Topic		Replies	Views
Nomad placement failures unrelated constraints and resource allocation Nomad	0	795	May 24, 2022
Making sense of "failed to place allocation" logs Nomad	0	1493	October 8, 2021
How to find out why a job placement is failing with a constraint Nomad	1	1561	November 21, 2021
System job and constraint Nomad	4	753	June 8, 2022
How do I debug a networking problem? Nomad	3	1607	January 20, 2021

Placement failures - how do I debug it?

Related topics