Nomad Autoscaler aws-asg: filtering by node_class and datacenter

I was looking at the documentation for aws-asg Autoscaling Plugins: AWS ASG | Nomad by HashiCorp
and I noticed that when selecting Nodes for scale-in events, the autoscaler can only select by node_class or by datacenter and not both.

Our cloud setup is such that we have multiple AWS regions, with the same ASG setup terraformed in each. For example, we may have:

datacenter "us-east-1" that exists in AWS region us-east-1
asg 1: node_class_a
asg 2: node_class_b

datacenter "us-west-1" that exists in AWS region us-west-1
asg 1: node_class_a
asg 2: node_class_b

etc etc

When a policy goes to scale in and select nodes to do so for the target ASG using node_class, it may select nodes from a different datacenter, fail to find the instance in the ASG and so fail to scale in, eventually hitting the delivery_limit and failing for good. Likewise, if you were to filter by "datacenter instead then you may end up selecting Nodes that belong to a different ASG and also fail for the same reason.

Any insight into this?

  • Is it recommended to always have the datacenter included in a given node_class to prevent these kinds of issues? That would be a shame as then that makes our alerting/metrics gathering a bit more complicated as it is based off node_class labels.
  • Do you see value in being able to allow filtering by both datacenter and node_class rather than making them mutually exclusive in the config? That way you can be sure you are selecting the correct Node for a given ASG.

Hi @peter.lockhart and thanks for raising this discussion.

I don’t see any reason why we can’t allow the use of both datacenter and node_class to group Nomad nodes into scalable pools. The current design is purely because we had not considered or been aware of this use case.

Could you raise this request against the Nomad Autoscaler repository so that other engineers and the community can see it?

jrasell and the Nomad team

Thanks jrasell! I have raised this here aws-asg: allow filtering by node_class and datacenter · Issue #531 · hashicorp/nomad-autoscaler · GitHub