Ensuring two jobs cannot run on the same node

Hi all!

This is a bit of a strange one, but we have two jobs that don’t play nicely if they run on the same node. Is there a way to ensure that they are always assigned to different nodes? Currently we’re having to manually do it (after they jobs are started, check, and if they’re on the same nodes, restart until they’re re-assigned nicely).

If there’s no inbuilt way of doing it, does anyone have suggestions on methods it could be done?

Thank you for your thoughts!

Edit:

Just to confirm, constraints are something we’re looking into but at the moment these two things are entirely separate jobs - it seems like we would need to group them under the same job to ensure they don’t colocate from what I’m reading. Please correct me if I’m wrong! :slight_smile:

2 Likes

Just a long shot, but would job_anti_affinity do the trick?

Hi brucellino1, thanks for your response and suggestion!

Correct me if I’m wrong, but what I’m reading of the job_anti_affinity is that it’s specifically for that job. Something like that is exactly what I’m after, but having two jobs share that affinity/value - if that makes sense?

IE: we have Job A, and Job B.

If Job A is on this node, Job B will skip it, and vice versa.

I’m currently thinking some form of custom attribute coupled with constraint might do it ,but not 100% sure how to do it right now.

If there were a way to check, for example “If JobA count >0, JobB won’t allocate” it would be perfect!

1 Like

Hi @Eambo !

The original question was interesting to me, because I needed something similar. Framed a different way, you need to know facts about other jobs.

I needed to know facts about other nodes in a job:

… to me these are similar kinds of questions, meaning that they would consume similar types of data.

However,

I interpret this to mean that Job A’s state blocks Job B’s state? That sounds like you want inter-task dependencies…

If on the other hand this is the central problem:

then it seems like you need to look up information about a different job before submitting one. This may be a smell that you’re “doing it wrong” – either not abstracting the application in the job properly, colliding ports, etc. But if these are the same job, just different allocations, then perhaps the distinct_hosts operator is what you’re looking for? This might work well for parallel executions of a task in a job.

Does any of this hit home?

Their is an easier hack to avoid two tasks (might help your use case of jobs too) from landing on the same node; use a static network port. The second task (job) will eliminate the first node altogether due to the port collision. Be sure to use some obscure out-of-normal-range port, so as to avoid causing problems to other jobs/tasks.

HTH. :slight_smile:

1 Like

Job A and Job B are entirely separate in functionality and how they work - but cannot run together on the same node. I think your first thoughts were correct based on how I’m reading it :slight_smile:

Essentially I want Job A and Job B to be aware of one-another, and never run on the same node - but I want them to have full availability of every node out there (IE: If there are 5 nodes, they should be able to allocate to all these nodes UNLESS the other job is already there).

Hopefully this makes sense? Thank you again for your response and thoughts!

Thank you, shantanugadgil! I had actually looked into this potential, but our jobs have ports they need to be dedicated to individually - we can’t change their ports to force this sort of conflict.

If there’s a way to set a secondary port and cause a conflict that won’t cause problems with the primary port, or something similar, that’s an idea I can work with! :smiley:

I think this is a good approach too. You can set any number of ports for the job - if they’re static the scheduler should claim an entire node for a given allocation of the job, else there would be a collision.

According to the docs you can set

job "plays_rough" {  
  group "cannot_share" {
    network {
      port "existing_necessary" {
      ... 
      }
      port "play_nice" {
        static = 42805
      }
    }
  }
}

That should exclude the node where a job has been allocated for subsequent evaluations.

Curious to know if this works!