I was looking at the documentation for aws-asg
Autoscaling Plugins: AWS ASG | Nomad by HashiCorp
and I noticed that when selecting Nodes for scale-in events, the autoscaler can only select by node_class
or by datacenter
and not both.
Our cloud setup is such that we have multiple AWS regions, with the same ASG setup terraformed in each. For example, we may have:
datacenter "us-east-1" that exists in AWS region us-east-1
asg 1: node_class_a
asg 2: node_class_b
datacenter "us-west-1" that exists in AWS region us-west-1
asg 1: node_class_a
asg 2: node_class_b
etc etc
When a policy goes to scale in and select nodes to do so for the target ASG using node_class
, it may select nodes from a different datacenter, fail to find the instance in the ASG and so fail to scale in, eventually hitting the delivery_limit
and failing for good. Likewise, if you were to filter by "datacenter
instead then you may end up selecting Nodes that belong to a different ASG and also fail for the same reason.
Any insight into this?
- Is it recommended to always have the datacenter included in a given
node_class
to prevent these kinds of issues? That would be a shame as then that makes our alerting/metrics gathering a bit more complicated as it is based offnode_class
labels. - Do you see value in being able to allow filtering by both
datacenter
andnode_class
rather than making them mutually exclusive in the config? That way you can be sure you are selecting the correct Node for a given ASG.