dependency deadlock issue between aws_vpc_endpoint (of type Interface) and subnets

Hi All,

I’m facing a dependency deadlock issue between aws_vpc_endpoint (of type Interface) and subnets in the same Terraform state.

In my configuration, the VPC endpoint references the subnet IDs, which creates a Terraform dependency from the endpoint to the subnets.
However, when AWS creates the interface endpoint, it automatically provisions ENIs inside those subnets, which introduces an implicit AWS-level dependency from the subnets back to the endpoint.

As a result, during updates or deletions (for example, when I try to delete or replace a subnet), Terraform attempts to delete the subnet first — but AWS blocks the deletion because the ENIs from the VPC endpoint are still attached.
This causes a circular dependency:

Terraform won’t delete the VPC endpoint first, because it depends on the subnet.

AWS won’t delete the subnet first, because it depends on the VPC endpoint’s ENIs.

Is there a recommended way to handle or break this deadlock between Terraform’s dependency graph and AWS’s internal resource dependencies for interface VPC endpoints?

Hi @aakash-acquia,

Can you express the configuration you want directly in Terraform’s HCL without the CDK?
Given that the CDK can’t do anything which Terraform cannot inherently do, it might be easier to see the problem in terms of Terraform.

If Terraform thinks that the VPC endpoint depends on the subnets then it should try to destroy the VPC endpoint first: Terraform destroys objects in reverse dependency order.

Therefore I would expect both Terraform and EC2 to agree on the dependency ordering here, but I have a different theory about what’s happening:

I don’t have experience with VPC Endpoints in particular, but I have used other services that cause ENIs to be implicitly created inside a subnet, and a common problem with all of those is that there is often some delay after deleting the object before the implicit ENIs associated with it are actually deleted in EC2. Therefore the deletion of the subnet gets stuck waiting for the ENI to actually be gone, which can take several minutes.

When I encountered that problem I found that if I waited 5 to 10 minutes after the first terraform destroy failed then a second attempt would succeed because the ENIs would be freed by then. Of course, that’s not a practical long-term solution, but I suggest it just in case it helps you determine whether delayed cleanup of an ENI is the real problem you are facing.

If that is the problem then I’m afraid I don’t have a great suggestion for how to solve it. I was able to make things work by increasing the destroy timeout on aws_subnet to be long enough for the ENIs to actually be deleted, but that solves it only by making terraform destroy keep retrying long enough for the subnet deletion to succeed. :confounded_face: