I’m just deploying the python ecs example but I’m seeing a brief ~20 second outage before the new build comes up where I get 503 Service Temporarily Unavailable
returned.
Is this expected?
I’m just deploying the python ecs example but I’m seeing a brief ~20 second outage before the new build comes up where I get 503 Service Temporarily Unavailable
returned.
Is this expected?
Yes, this is from the AWS side and the time it takes to warm up resources, such as load balancers, that Waypoint creates.
The load balancers are already created - it’s just a case of deploying a new image to ECS, no?
If I do something like this in Pulumi I don’t get any downtime. The same if I manually update the task definition using the aws cli.
I’m also seeing this which is strange - should there still be 2 concurrent services defined?
Hi @dabeeeenster!
We should check this out, I wonder if perhaps the ALB is sending traffic to your new instance before it’s ready to receive traffic. We might be missing a health check that would prevent that from happening, but the other aspect is that we change the weights on the listener to send traffic to the target group for the new version. That might be forcing the ALB to switch all traffic over before it’s ready as well.
By the by, that’s the reason you’re seeing 2 concurrent services. That’s because we boot up a separate search for each deployment. This give you the ability to rollback to the previous one easily.
I presume on Pulumi, you were updating a singular service with the new image?
I don’t think there’s any health checks defined in that sample project so maybe that’s it?
On Pulumi I believe (but I’m not 100% sure) its just sending a new task definition to ECS and then it lets ECS handle the traffic migration when the new container is running and healthy.
Should I create an issue on Github?
Hi, I am experiencing the same issues as reported in the main thread. Is there any flag that must be specified to avoid service disruption?
Here’s the event log excerpt when the service is being deployed:
...
2021-06-29 10:50:07 +0100 | service REDACTED (port 5001) is unhealthy in target-group REDACTED due to (reason Health checks failed).
2021-06-29 10:51:35 +0100 | service REDACTED has reached a steady state.
...