I was looking at the documentation for
aws-asg Autoscaling Plugins: AWS ASG | Nomad by HashiCorp
and I noticed that when selecting Nodes for scale-in events, the autoscaler can only select by
node_class or by
datacenter and not both.
Our cloud setup is such that we have multiple AWS regions, with the same ASG setup terraformed in each. For example, we may have:
datacenter "us-east-1" that exists in AWS region us-east-1 asg 1: node_class_a asg 2: node_class_b datacenter "us-west-1" that exists in AWS region us-west-1 asg 1: node_class_a asg 2: node_class_b etc etc
When a policy goes to scale in and select nodes to do so for the target ASG using
node_class, it may select nodes from a different datacenter, fail to find the instance in the ASG and so fail to scale in, eventually hitting the
delivery_limit and failing for good. Likewise, if you were to filter by "
datacenter instead then you may end up selecting Nodes that belong to a different ASG and also fail for the same reason.
Any insight into this?
- Is it recommended to always have the datacenter included in a given
node_classto prevent these kinds of issues? That would be a shame as then that makes our alerting/metrics gathering a bit more complicated as it is based off
- Do you see value in being able to allow filtering by both
node_classrather than making them mutually exclusive in the config? That way you can be sure you are selecting the correct Node for a given ASG.