ElasticSearch domain ValidationException on Cloudwatch Logs Resource Policy

I opened this as an issue because I think its a bug, but im running into something odd when trying to create an ElasticSearch domain, specifically with the Cloudwatch Log Resource Policy.

I have configured the resource policy to allow elasticsearch to write to only the log groups I have created for that domain, rather than all of cloudwatch. However, the apply fails on the first run and succeeds when applying again. Seems to be some sort of race condition, but I cannot figure out what is going on. The GitHub issue is #14497 if you would like to view the full details, but here is a quick summary:

The cloudwatch logs resource policy definition looks like this:

data "aws_iam_policy_document" "cloudwatch" {
  statement {
    actions = [
      "logs:PutLogEvents",
      "logs:PutLogEventsBatch",
      "logs:CreateLogStream",
    ]
    effect = "Allow"
    principals {
      type        = "Service"
      identifiers = ["es.amazonaws.com"]
    }
    resources = [
      # for k, v in aws_cloudwatch_log_group.es_logs : "${v.arn}:*" This fails
      for k, v in var.log_publishing_options : "arn:aws:logs:us-east-1:${data.aws_caller_identity.current.account_id}:log-group:/aws/aes/${var.domain_name}/${k}:*" # This fails too
      # "arn:aws:logs:us-east-1:${data.aws_caller_identity.current.account_id}:log-group:*" # This works
    ]
  }
}

The log publishing options variable is:

log_publishing_options = {
    index = {
      enabled           = true
      log_type          = "INDEX_SLOW_LOGS"
      retention_in_days = 7
    },
    search = {
      enabled           = true
      log_type          = "SEARCH_SLOW_LOGS"
      retention_in_days = 14
    },
    application = {
      enabled           = true
      log_type          = "ES_APPLICATION_LOGS"
      retention_in_days = 14
    }
  }

And the log group config is:

resource "aws_cloudwatch_log_group" "es_logs" {
  for_each          = { for k, v in var.log_publishing_options : k => v if lookup(v, "enabled", false) == true }
  name              = "/aws/aes/${var.domain_name}/${each.key}"
  retention_in_days = lookup(each.value, "retention_in_days", 14)

  tags = merge(
    var.tags,
    {
      Name    = "/aws/aes/${var.domain_name}/${each.key}"
      service = var.service,
      team    = var.team,
      phi     = var.phi
    },
  )
}

For some reason, when you run it the first time Terraform complains with:

Error: Error creating ElasticSearch domain: ValidationException: The Resource Access Policy specified for the CloudWatch Logs log group /aws/aes/example-domain/search does not grant sufficient permissions for Amazon Elasticsearch Service to create a log stream. Please check the Resource Access Policy.

But, running it a second time without changing any code does not yield the error.

I have also found that creating a resource policy with more open permissions seems to skip over the error as well, the line is commented out above.

If anyone has figured this out I would be eternally grateful.

1 Like

I am having the same issue , it fails for the first time and when u run it for the second time it pass

Same here. Within the terraform apply logs it looks like the elasticsearch domain is updated in parallel to the cloudwatch logs policy (due to domain having a dependency on the log group, but not the log group policy explicitly, it sees that the policy does not exist when es domain is updated). Could it help to add

depends_on = [
aws_cloudwatch_log_resource_policy.{your policy resource name here}
]
?

Even worse, the ValidationException happened after state is changed (using app.terraform.io), so that terraform thinks the logs_publish_options have been applied, but they have not (aws console does not have logs set up for the es domain).

Did anyone experience the same?