ECS srevice destroy stuck issue

Hi All,

I am creating an ECS EC2 type cluster and deploying my container images in that using ECS service. Deployment is succesfull and all the required resources are created and container is deployed onto the ECS cluster.

But when i am trying to destroy the resources, the operation gets stuck at ECS service destroy and gets failed after 20 min timeout (Althuogh it deletes the ECS service in the backend but remains in destroying state) and when i do terraform destroy again it deletes the remaining resources.

Please provide your suggestion if anyone have faced this kind of issue.

I have attached the screenshot of the timeout error also for reference.

1 Like

the service can be only deleted if you scale it down to 0 replicas. It’s not possible to remove services if you have one or more replicas (desired state). It’s the stupid ECS thing.

1 Like

during the stuck phase i can see in the portal that service is deleted from ECS cluster and showing the status as ‘Draining’. So It’s ECS who is not updating the actual status.

I’m experiencing same issue. And I ended up being delete terraform.tf and remove all deployed resources manually. Are there better ways to handle this kind issue?

Yes, I came up with the solution. We need to scale down the ECS service to zero once destroy operation is send using local executioner.

ECS service for gitlabrunner

resource “aws_ecs_service” “gitlabrunner_service” {
name = var.service_name
cluster = aws_ecs_cluster.gitlabrunner_cluster.id
task_definition = aws_ecs_task_definition.gitlabrunner_td.arn
desired_count = var.desired_tasks
#launch_type = “EC2”

capacity_provider_strategy {
capacity_provider = aws_ecs_capacity_provider.gitlabrunner_cp.name
weight = 100
}

#force_new_deployment = true

deployment_circuit_breaker {
enable = true
rollback = true
}

provisioner “local-exec” {
when = destroy
command = <<EOF
echo “Update service desired count to 0 before destroy.”
#Get region out of cluster
REGION=(echo {self.cluster} | cut -d’:’ -f4)
echo “Region: $REGION”
#Set the Service desired count to 0
aws ecs update-service --region REGION --cluster {self.cluster} --service ${self.name} --desired-count 0 --force-new-deployment
echo “Update service command executed successfully.”
EOF
}

timeouts {
#create = “10m” # Timeout for resource creation
delete = “5m” # Timeout for resource deletion
}
}

The ECS service destruction process gets stuck and eventually fails after a 20-minute timeout, even though the ECS service is deleted in the backend. Strangely, upon running terraform destroy again, the remaining resources are successfully removed. I have attached a screenshot of the timeout error for reference.

1 Like