Terraform race condition "couldn't find resource"

While creating ECS task-definitions, I run into the below error and have been unable to get past it. I tried implementing a depends_on as well as a null_resource with a 10 sec delay but I am not able to get past this error.

The ECS task is getting created successfully but the Terraform code is being executed via GitHub Actions workflow so everytime the error occurs we get an exit code 1 and the workflow terminates.

Additionally, setting TF_LOG to DEBUG does not give any additional information besides the error below.

Error: reading ECS Task Definition (arn:aws:ecs:***:************:task-definition/ecs-task:17): couldn't find resource
# Introducing a small local-exec sleep for ECS Task to Register
resource "null_resource" "ecs_propagation_delay" {
  provisioner "local-exec" {
    command = "sleep 10"
  }
  triggers = {
    always_run = "${timestamp()}"
  }
}

resource "aws_ecs_task_definition" "ecs_task" {
  family                   = "${var.resource_prefix}-task"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 256
  memory                   = 512

  container_definitions = jsonencode([
    {
      .....
  ])
}

resource "aws_ecs_service" "ecs_service" {
  name            = "${var.resource_prefix}-service"
  cluster         = aws_ecs_cluster.cluster.id
  task_definition = aws_ecs_task_definition.task.arn
  desired_count   = 1
  launch_type     = "FARGATE"

  network_configuration {
    subnets = var.subnet_ids

    assign_public_ip = true
    security_groups  = [aws_security_group.sg.id]
  }

  depends_on = [
    null_resource.ecs_propagation_delay,
    aws_ecs_task_definition.ecs_task
  ]

}

Hi @sagar.joshi,

I see 2 possible problems here, but it’s not clear what is really causing an issue because this is not a complete configuration, and we don’t have the actual diagnostic output.

If the aws_ecs_service.ecs_service is where the failure is happening, its task_definition is coming from aws_ecs_task_definition.task.arn, but you are adding a manual dependency for aws_ecs_task_definition.ecs_task. Now both of these must be evaluated before the aws_ecs_service.ecs_service, but it would help to know which one is the problematic dependency.

You are also trying to use null_resource.ecs_propagation_delay to create an artificial delay between the resources, however the null_resource.ecs_propagation_delay resource doesn’t reference any other resources so will be executed as early as possible, and probably not adding any delay between the resources you are concerned with.

Now a resource apply operation should not return until the resource is available for use, however providers or the remote services may fail to fulfill that guarantee. In the case where you need to add a delay for this reason, there is already a time_sleep resource which might be easier to manage and more featurefull than a null_resource with a provisioner.

1 Like

Hi @jbardin, thank you for your response. I tried your suggestions but still seeing the same error.

##[debug]│ Error: reading ECS Task Definition (arn:aws:ecs:***:************:task-definition/ecs-task:24): couldn't find resource
##[debug]│ 
##[debug]│   with aws_ecs_task_definition.ecs_task,
##[debug]│   on main.tf line 155, in resource "aws_ecs_task_definition" "ecs_task":
##[debug]│  155: resource "aws_ecs_task_definition" "ecs_task" {

Based on the error I see in the logs, the failure is happening at the aws_ecs_task_definition level. The dependency for task creation is at the aws_ecs_service level

resource "aws_ecs_task_definition" "ecs_task" { ... }

resource "aws_ecs_service" "ecs_service" {
  task_definition = aws_ecs_task_definition.ecs_task.arn
  depends_on      = [aws_ecs_task_definition.ecs_task]
}

So then as suggested, I switched to using time_sleep, instead of null_resource but it didn’t work.

resource "time_sleep" "ecs_delay" {
  depends_on      = [aws_ecs_task_definition.ecs_task]
  create_duration = "10s"
}

resource "aws_ecs_service" "ecs_service" {
  depends_on = [aws_ecs_task_definition.ecs_task, time_sleep.ecs_delay]
}

I even tried the data resource approach to force-reload the latest task definition:

data "aws_ecs_task_definition" "current_task" {
  task_definition = aws_ecs_task_definition.ecs_task.family
}

resource "aws_ecs_service" "ecs_service" {
  task_definition = data.aws_ecs_task_definition.current_task.arn
  depends_on      = [aws_ecs_task_definition.ecs_task]
}

However, I am still seeing the error. The task definitions are confirmed to exist in AWS, and the ARN matches. Please advise further.

I’m not familiar with the aws resources involved here, so don’t know exactly what should be expected, and can only really comment about the Terraform config itself.

The error message is coming from aws_ecs_task_definition.ecs_task,

│   with aws_ecs_task_definition.ecs_task,
│   on main.tf line 155, in resource "aws_ecs_task_definition" "ecs_task":

but you’re not showing the config for that. I don’t think delaying aws_ecs_service.ecs_service can affect the error you’ve shown here, because the error has already happened before the delay begins.

A minor note for your understanding, which may or may not make a difference elsewhere in the config. Using depends_on like so is redundant

  task_definition = aws_ecs_task_definition.ecs_task.arn
  depends_on      = [aws_ecs_task_definition.ecs_task]

There is already a reference to aws_ecs_task_definition.ecs_task, so it already depends on that resource and adding it to depends_on doesn’t change anything.