Current best practices for zero downtime ASG AMI updates?

I’ve been looking into what it would take to move over from ECS Fargate to operating our own cluster. The main reason is access to BPF tracing tools, which would come in handy on some of our services. We’ve also seen significantly more downtime from Fargate than from pure EC2+ASG.

There are multiple tutorials about this that are about 2 years old and they seem to employ some fairly complicated ideas like using aws_cloudformation_stack. My hope is to be able to make a configuration that would basically just involve changing the AMI ID in the terraform launch template resource, hit terraform apply and possibly run some AWS System Manager playbooks before or after to prepare or rollback changes.

Is there any approach that gets me even close to this? The one problem I see is that if I run terraform apply and it changes the launch template, then the launch template resource has been changed successfully and will not be rolled back even if I have some Lambda listening on ASG events that triggers a failure on the ASG resource update if the tests on the new nodes don’t pass.

How have people approached this issue?