We use Blue/Green deployments for our applications to keep deployment times as fast as possible. (i.e. the value for Canary
is equal to the Count
on a Job)
It’s quite common for us to have a Count
which is more than half of the number of Nodes in the cluster. Unfortunately in this situation Nomad’s Job anti-affinity penalty can lead to the cluster becoming very unbalanced following a deploy.
We use the “Spread” scheduling option, although I think this same problem is relevant to the default binpacking scheduler as well.
Here’s an minimal example of what we’re seeing:
-
We have a Job (A) which is currently deployed to version 1 (A1). It has
Count: 3
, and each allocation is on a separate node:node allocs node1 A1 node2 A1 node3 A1 node4 (empty) -
When we deploy version 2 (A2), the anti-affinity penalty mean that one allocation is placed on the empty node. Now all nodes have the application running, and are equally ranked. anti-affinity is applied even though the Job’s version is different. Therefore the remaining two allocations are placed randomly. It is possible for a second allocation of A2 to be placed on node4:
node allocs node1 A1 node2 A1 node3 A1 A2 node4 A2 A2 -
A1 allocations are then retired, leaving things in an unbalanced state:
node allocs node1 node2 node3 A2 node4 A2 A2
Placement Metrics
Placement metrics for the three A2 allocations look like this:
Placement Metrics
Node binpack job-anti-affinity node-affinity node-reschedule-penalty final score
a6755999-dfb1-0ce6-920c-338558598b53 0.79 0 0 0 0.79
02fea4c3-a505-69e2-58b9-625298c747d7 0.647 -0.667 0 0 -0.01
8ee5efcd-5ade-2a00-ca25-8e3386e0d8bc 0.647 -0.667 0 0 -0.01
fa516cb0-7a23-ef2d-2170-c6a0d2c4cb06 0.647 -0.667 0 0 -0.01
Placement Metrics
Node binpack job-anti-affinity node-affinity node-reschedule-penalty final score
fa516cb0-7a23-ef2d-2170-c6a0d2c4cb06 0.647 -0.667 0 0 -0.01
02fea4c3-a505-69e2-58b9-625298c747d7 0.647 -0.667 0 0 -0.01
8ee5efcd-5ade-2a00-ca25-8e3386e0d8bc 0.647 -0.667 0 0 -0.01
a6755999-dfb1-0ce6-920c-338558598b53 0.647 -0.667 0 0 -0.01
Placement Metrics
Node binpack job-anti-affinity node-affinity node-reschedule-penalty final score
a6755999-dfb1-0ce6-920c-338558598b53 0.647 -0.667 0 0 -0.01
02fea4c3-a505-69e2-58b9-625298c747d7 0.647 -0.667 0 0 -0.01
fa516cb0-7a23-ef2d-2170-c6a0d2c4cb06 0.647 -0.667 0 0 -0.01
8ee5efcd-5ade-2a00-ca25-8e3386e0d8bc 0.547 -1 0 0 -0.227
Searching the code, it looks like this is the relevant section:
If this statement were updated to check for version equality as well as JobID/taskGroup, then I think things would work much better for us. I may look at submitting a feature request for this on GitHub.
In the meantime, has anyone else seen this type of issue? Are there any config options or workarounds which might be able to help us?