Schedule tasks on the same node, but configure them independently

I have two tasks I want to run, let’s call them A and B.
A is a realtime application that should only restart when absolutely necessary. The goal is to (in theory) never touch A at all.
B serves an API for A. I am making sure that A can handle restarts of B.

The setup I am looking for:

  1. A and B run on the same host to make it possible to use a UNIX domain socket for communication.
  2. When B gets upgraded, only B restarts (or ideally a canary upgrade is performed).
  3. Both tasks can be rescheduled on another node if the current node breaks or gets drained.
  4. Optional: A has the same upgrade characteristics as B.

Right now I have a setup that fulfills everything except 1. The obvious solution would be putting both tasks in one group but that would break 2 and 4.

I am thankful for any help to make this happen. If that involves a slightly different application structure I am open to that as well.

I think the constraint stanza should work.

ref: constraint Block - Job Specification | Nomad | HashiCorp Developer

HTH

Sorry I added an answer instead of replying.

I am looking at the docs right now. What I need would be the inverse of “distinct_hosts”. But that does not exist or am I missing something here?

I imagine that if you preseed a machine with some metadata, you could write two separate job files A and B which target that machine.

But, upon thinking more about this, in a scenario when the machine dies, you would need to relaunch it on another node.

I don’t have a definite answer (yet) on how to dynamically co-locate two jobs, but if you are using some cloud provider (like AWS), you could create your setup using an Autoscaling Group with count == 1.

When machine dies, it would relaunch with the same meta.

Though, I hope there should be some other method to co-locate jobs.

I think the following could work:

ref: Commands: node meta apply | Nomad | HashiCorp Developer
ref: lifecycle Block - Job Specification | Nomad | HashiCorp Developer

The idea is (I haven’t tested this myself):

  1. In your job A, have a poststart task, which will add some dynamic metadata to the node where it runs. Example: nomad node meta apply ready=1
  2. Make it such that job B has constraint to launch on a node with meta ready == 1

I believe this could work.

Thank you for further digging into this.

My initial thought on this approach is that it would most probably break canary deployments:

  • New alloc gets created on any node
  • That node gets meta (I would have to use a key unique to the job)
  • Here comes the issue: Task B could spin up on either node now because the old allocation is still running so the meta is still there

I opened an issue for this use case but am still thankful for ideas.

Agreed. If “A” crashes and restarts on node2, B will stay on node1.

The following is a yucky hack:

Make jobB also add a dynamic meta.

Use affinity (not constraint) in jobA which targets the meta added by B.

Of course this is not guaranteed, but worth a try! :man_shrugging:

off topic, can you afford to have both jobs failing for some time until you manually recreate the node where they launch?

then you can preseed the node with specific job metadata from agent config.

No, I don’t want to break the rescheduling in case of node failure.

1 Like