Ensuring two jobs cannot run on the same node

Eambo · February 14, 2022, 1:07pm

Hi all!

This is a bit of a strange one, but we have two jobs that don’t play nicely if they run on the same node. Is there a way to ensure that they are always assigned to different nodes? Currently we’re having to manually do it (after they jobs are started, check, and if they’re on the same nodes, restart until they’re re-assigned nicely).

If there’s no inbuilt way of doing it, does anyone have suggestions on methods it could be done?

Thank you for your thoughts!

Edit:

Just to confirm, constraints are something we’re looking into but at the moment these two things are entirely separate jobs - it seems like we would need to group them under the same job to ensure they don’t colocate from what I’m reading. Please correct me if I’m wrong!

brucellino1 · February 14, 2022, 1:27pm

Just a long shot, but would job_anti_affinity do the trick?

Eambo · February 14, 2022, 1:32pm

Hi brucellino1, thanks for your response and suggestion!

Correct me if I’m wrong, but what I’m reading of the job_anti_affinity is that it’s specifically for that job. Something like that is exactly what I’m after, but having two jobs share that affinity/value - if that makes sense?

IE: we have Job A, and Job B.

If Job A is on this node, Job B will skip it, and vice versa.

I’m currently thinking some form of custom attribute coupled with constraint might do it ,but not 100% sure how to do it right now.

If there were a way to check, for example “If JobA count >0, JobB won’t allocate” it would be perfect!

brucellino1 · February 14, 2022, 2:10pm

Hi @Eambo !

The original question was interesting to me, because I needed something similar. Framed a different way, you need to know facts about other jobs.

I needed to know facts about other nodes in a job:

… to me these are similar kinds of questions, meaning that they would consume similar types of data.

However,

I interpret this to mean that Job A’s state blocks Job B’s state? That sounds like you want inter-task dependencies…

If on the other hand this is the central problem:

then it seems like you need to look up information about a different job before submitting one. This may be a smell that you’re “doing it wrong” – either not abstracting the application in the job properly, colliding ports, etc. But if these are the same job, just different allocations, then perhaps the distinct_hosts operator is what you’re looking for? This might work well for parallel executions of a task in a job.

Does any of this hit home?

shantanugadgil · February 14, 2022, 2:21pm

Their is an easier hack to avoid two tasks (might help your use case of jobs too) from landing on the same node; use a static network port. The second task (job) will eliminate the first node altogether due to the port collision. Be sure to use some obscure out-of-normal-range port, so as to avoid causing problems to other jobs/tasks.

HTH.

Eambo · February 14, 2022, 3:37pm

Job A and Job B are entirely separate in functionality and how they work - but cannot run together on the same node. I think your first thoughts were correct based on how I’m reading it

Essentially I want Job A and Job B to be aware of one-another, and never run on the same node - but I want them to have full availability of every node out there (IE: If there are 5 nodes, they should be able to allocate to all these nodes UNLESS the other job is already there).

Hopefully this makes sense? Thank you again for your response and thoughts!

Eambo · February 14, 2022, 3:38pm

Thank you, shantanugadgil! I had actually looked into this potential, but our jobs have ports they need to be dedicated to individually - we can’t change their ports to force this sort of conflict.

If there’s a way to set a secondary port and cause a conflict that won’t cause problems with the primary port, or something similar, that’s an idea I can work with!

brucellino1 · February 14, 2022, 3:53pm

I think this is a good approach too. You can set any number of ports for the job - if they’re static the scheduler should claim an entire node for a given allocation of the job, else there would be a collision.

According to the docs you can set

job "plays_rough" {  
  group "cannot_share" {
    network {
      port "existing_necessary" {
      ... 
      }
      port "play_nice" {
        static = 42805
      }
    }
  }
}

That should exclude the node where a job has been allocated for subsequent evaluations.

Curious to know if this works!

Topic		Replies	Views
Application deployment constraint on the basis of hostname Nomad	2	1967	October 27, 2020
How can I ask "please nomad, kindly place <job> onto this node" Nomad	5	1344	November 3, 2022
Enforcing Node Allocation to Specific Namespaces in Nomad Open Source Nomad	1	84	June 18, 2024
Constraint attribute based on jobs? Nomad	3	668	October 18, 2021
System job and constraint Nomad	4	721	June 8, 2022

Ensuring two jobs cannot run on the same node

Related topics