Zero downtime deployment

Given an existing consul server, and I deploy my application service onto an Amazon EC2, along with consul client, then the client will find the server by the tags using the aws api and register them (using an IAM role on the instance). Consul is then configuring backend pool configuration of a supported proxy via its data plane API.

When I want to deploy a new version of my application service, I would provision new EC2s in a new autoscaling group (immutable infrastructure). Before I destroy my previous autoscaling group, I would typically test my target group before adding them to the load balancer. I am aware that consul has health checks and so do the backend pools of the configured proxy. There’s two scenarios I’m trying to understand:

  1. A bad service application (fails the consul health checks). I some consul will not configure the backend pool of a proxy if the health check fails and it is up my pipeline to handle this. I assume I can use the consul rest api to confirm passing health checks.

  2. The new service application is healthy and I delete the old autoscaling group. Is there a race condition between the machine getting the shutdown signal from the autoscaling group terminating the instance and the consul client reporting to consul server that the instance should be removed? I suspect so.


Sounds like a job for service mesh, especially l7 traffic management and canary deployment:

1 Like