Anti-affinity doesn't consider the job version, leading to unbalanced load

We use Blue/Green deployments for our applications to keep deployment times as fast as possible. (i.e. the value for Canary is equal to the Count on a Job)

It’s quite common for us to have a Count which is more than half of the number of Nodes in the cluster. Unfortunately in this situation Nomad’s Job anti-affinity penalty can lead to the cluster becoming very unbalanced following a deploy.

We use the “Spread” scheduling option, although I think this same problem is relevant to the default binpacking scheduler as well.

Here’s an minimal example of what we’re seeing:

  1. We have a Job (A) which is currently deployed to version 1 (A1). It has Count: 3, and each allocation is on a separate node:

    node allocs
    node1 A1
    node2 A1
    node3 A1
    node4 (empty)
  2. When we deploy version 2 (A2), the anti-affinity penalty mean that one allocation is placed on the empty node. Now all nodes have the application running, and are equally ranked. anti-affinity is applied even though the Job’s version is different. Therefore the remaining two allocations are placed randomly. It is possible for a second allocation of A2 to be placed on node4:

    node allocs
    node1 A1
    node2 A1
    node3 A1 A2
    node4 A2 A2
  3. A1 allocations are then retired, leaving things in an unbalanced state:

    node allocs
    node1
    node2
    node3 A2
    node4 A2 A2

Placement Metrics

Placement metrics for the three A2 allocations look like this:

Placement Metrics
Node                                  binpack  job-anti-affinity  node-affinity  node-reschedule-penalty  final score
a6755999-dfb1-0ce6-920c-338558598b53  0.79     0                  0              0                        0.79
02fea4c3-a505-69e2-58b9-625298c747d7  0.647    -0.667             0              0                        -0.01
8ee5efcd-5ade-2a00-ca25-8e3386e0d8bc  0.647    -0.667             0              0                        -0.01
fa516cb0-7a23-ef2d-2170-c6a0d2c4cb06  0.647    -0.667             0              0                        -0.01
Placement Metrics
Node                                  binpack  job-anti-affinity  node-affinity  node-reschedule-penalty  final score
fa516cb0-7a23-ef2d-2170-c6a0d2c4cb06  0.647    -0.667             0              0                        -0.01
02fea4c3-a505-69e2-58b9-625298c747d7  0.647    -0.667             0              0                        -0.01
8ee5efcd-5ade-2a00-ca25-8e3386e0d8bc  0.647    -0.667             0              0                        -0.01
a6755999-dfb1-0ce6-920c-338558598b53  0.647    -0.667             0              0                        -0.01
Placement Metrics
Node                                  binpack  job-anti-affinity  node-affinity  node-reschedule-penalty  final score
a6755999-dfb1-0ce6-920c-338558598b53  0.647    -0.667             0              0                        -0.01
02fea4c3-a505-69e2-58b9-625298c747d7  0.647    -0.667             0              0                        -0.01
fa516cb0-7a23-ef2d-2170-c6a0d2c4cb06  0.647    -0.667             0              0                        -0.01
8ee5efcd-5ade-2a00-ca25-8e3386e0d8bc  0.547    -1                 0              0                        -0.227

Searching the code, it looks like this is the relevant section:

If this statement were updated to check for version equality as well as JobID/taskGroup, then I think things would work much better for us. I may look at submitting a feature request for this on GitHub.

In the meantime, has anyone else seen this type of issue? Are there any config options or workarounds which might be able to help us?

2 Likes

Offhand I can’t think of any reason job version shouldn’t be considered per your recommendation. We use spread a lot and this might explain some imbalances we’ve noticed but never quite put the time in to investigating

1 Like

Hi @davidtaylorhq :wave:

Thank you for the detailed report, and it does seem like a bug in Nomad. Could file this as a bug in our repo so we can better track it?

Thank you!