How can I ask "please nomad, kindly place <job> onto this node"

I noticed that the a node which I expected to be running an allocated job, isn’t. Is there a way to recover that? I want to say, “please nomad, kindly place onto this node”

The job definition definitely matches, there are 149 OTHER nodes named similarly which do run this job. The job has constraints which match this node.

To be clear, there are other jobs allocated on this node perfectly. I’d rather not nomad node drain -enable -self and nomad node drain -disable -self as those running jobs have established TCP connections I’d not want to reset.

Hi @jrwren. I would recommend looking at job constraints which could be used to ensure the job is placed on a specific node.

The constraint may look something like the following:

constraint {
    attribute = "${node.unique.id}"
    value     = "9afa5da1-8f39-25a2-48dc-ba31fd7c0023"
}

Thanks,
jrasell and the Nomad team

I tried to say that this is already done. The job constraint is in place. There are 2 other jobs with the exact same constraint allocated on this node. The job which isn’t allocated on this node once was allocated on this node but something happened to make it disappear and nomad either isn’t aware or didn’t reallocated it.

Hi @jrwren. Do you have any logs or error messages to help understand what happened? Are you able to try stopping the job, and resubmitting it?

Thanks,
jrasell and the Nomad team

I don’t know how to look for logs specific to that node and job.

I cannotdo not know how to stop and resubmit the job without interrupting the other 149 allocations. Doing so would interrupt service. We are planning on doing that in a few days or so for other reasons. I’m looking forward to see how the allocations change at that point, but it certainly would be nice to recover this one missing allocation on this one node.

It turns out the underlying docker installed on these systems suffers from a bug in which sometimes a dying container is not entirely cleaned up. It shows in docker ps output as running, but there are no processes. nomad sees it as running even though the health checks are not able to run. This seems to prevent nomad from starting a new one since it thinks it is already running.

Best fix: upgrade the docker package.