We’ve been trying to create an Automation task using the AWS-PatchAsgInstance document, which when used in a patch window specifies not to populate InstanceID (but it’s a required parameter nonetheless), and the subsequent task invocation fails because the task isn’t populated properly.
When passing (which is the right value to set for this task in an automation window)
parameter {
name = "InstanceId"
values = [""]
}
terraform actually sends
"InstanceId":[]
to the AWS API, which has the impact that the SSM task is malformed - is there any way of forcing Terraform to actually send the list with a single element of an empty string into the API (which is what the AWS console does)
This works for me:
# Super weird and not well documented.
# https://docs.aws.amazon.com/systems-manager/latest/userguide/mw-cli-register-tasks-parameters.html
# https://aws.amazon.com/premiumsupport/knowledge-center/ssm-ec2-stop-start-maintenance-window/
# https://discuss.hashicorp.com/t/aws-ssm-maintenance-window-task-parameter-as-list-of-empty-string-sends-empty-list/38484
parameter {
name = "InstanceId"
values = ["{{RESOURCE_ID}}"]
}
Also just to add: I’m currently in the same process of trying to get AWS-PatchAsgInstance
to work. There’s zero documentation on it sadly.
Currently I’m struggling with a kind of race condition.
- ASG min 1, desired 2, max 3
- Task with concurrency of 50
What happens now is that all instances in the ASG start at the same time with the task. Some of them try to go to standby at the same time. This leads to issues and in the end most executions fail.
An error occurred (ValidationError) when calling the EnterStandby operation: AutoScalingGroup eks-apps-48c04882-d046-2218-f24d-920bb62ccc26 has min-size=1, max-size=3, and desired-size=1. To place into standby 1 instance, please update the AutoScalingGroup sizes appropriately.
I guess concurrency needs to be adjusted to the size of the ASG.
Yeah - turns out that there’s the pseudo-parameters that need putting in like {{RESOURCE_ID}} which was what was screwing me up - I ended up raising a support ticket and suggested they fix the docs. The original problem isn’t necessarily the real one, though it’s a behaviour that differs from amazon’s own implementation of creating the task.
Also ran into the minimum ASG instances thing anywhere there’s a scaling policy etc and decided that actually it’s so much more terrible than I’d first imagined - the thing needs another task to raise the number of instances and then you need a guarantee that they all get patched, meanwhile you’ve got another instance that hangs about for x time.
We basically decided to give up with this SSM thing and patch by rebuilding the AMI, updating the launch config and refreshing the instances.