Terraform breaking iam role by removing and then re-adding policies

I’ve found recently when working with policy attachments that I’m having to run terraform twice when there are multiple attachments to roles.

For example

  • lambda in account a
    run a terraform apply - terraform removes the policy attachments
  • still in account a
    run a terraform apply - terraform adds back the policies it just removed.

Sample output

Terraform detected the following changes made outside of Terraform since the last "terraform apply":

  # aws_iam_role.Lambda_Backup_Role has changed
  ~ resource "aws_iam_role" "Lambda_Backup_Role" {
        id                    = "Lambda_Backup_Role20231227181807446200000001"
      ~ managed_policy_arns   = [
          + "arn:aws:iam::xxxxxxxxxxxx:policy/Allow_Lambda_Backup_from_sns",
          + "arn:aws:iam::xxxxxxxxxxxx:policy/shared_services_cross_account_sqs_access",
          + "arn:aws:iam::xxxxxxxxxxxx:policy/ssm_account_parameter_store_policy",
            # (1 unchanged element hidden)
        ]
        name                  = "Lambda_Backup_Role20231227181807446200000001"
        tags                  = {}
        # (9 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # aws_iam_role_policy_attachment.shared_services_sqs_queue_access has been deleted
  - resource "aws_iam_role_policy_attachment" "shared_services_sqs_queue_access" {
      - id         = "Lambda_Backup_Role20231227181807446200000001-20231227184719256300000001" -> null
      - policy_arn = "arn:aws:iam::xxxxxxxxxxxx:policy/shared_services_sqs_queue_access" -> null
      - role       = "Lambda_Backup_Role20231227181807446200000001" -> null
    }
1 Like

I am running into what I believe is the same issue. I made a really simple proof-of-concept to demonstrate this as well:

main.tf:

resource "aws_iam_role" "iam_role_test" {
  name = "jdc-tf-test-iam-role"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "eks-fargate-pods.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })

  managed_policy_arns = ["arn:aws:iam::aws:policy/AmazonEKSFargatePodExecutionRolePolicy"]
}

resource "aws_iam_policy" "logging_iam_policy" {
  name        = "jdc-tf-test-iam-policy"

  policy = jsonencode({
    Statement = [{
      Action = ["logs:PutLogEvents"]
      Effect   = "Allow"
      Resource = "*"
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "permissions_policy_role_attachment" {
  role       = aws_iam_role.iam_role_test.name
  policy_arn = aws_iam_policy.logging_iam_policy.arn

  depends_on = [
    aws_iam_policy.logging_iam_policy
  ]
}

Deploy this with terraform apply - the first time it will successfully deploy the IAM role, IAM policy, and the role policy attachment:

aws_iam_policy.logging_iam_policy: Creating...
aws_iam_role.iam_role_test: Creating...
aws_iam_policy.logging_iam_policy: Creation complete after 1s [id=arn:aws:iam::XXXXXXXXXXXX:policy/jdc-tf-test-iam-policy]
aws_iam_role.iam_role_test: Creation complete after 1s [id=jdc-tf-test-iam-role]
aws_iam_role_policy_attachment.permissions_policy_role_attachment: Creating...
aws_iam_role_policy_attachment.permissions_policy_role_attachment: Creation complete after 0s [id=jdc-tf-test-iam-role-XXXXXXXXXXXXXXXXXXXXXXXXXX]

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

Now do nothing else, just run terraform apply again. You should get results like this:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_iam_role.iam_role_test will be updated in-place
  ~ resource "aws_iam_role" "iam_role_test" {
        id                    = "jdc-tf-test-iam-role"
      ~ managed_policy_arns   = [
          - "arn:aws:iam::XXXXXXXXXXXX:policy/jdc-tf-test-iam-policy",
            # (1 unchanged element hidden)
        ]
        name                  = "jdc-tf-test-iam-role"
        tags                  = {}
        # (8 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_iam_role.iam_role_test: Modifying... [id=jdc-tf-test-iam-role]
aws_iam_role.iam_role_test: Modifications complete after 1s [id=jdc-tf-test-iam-role]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

If you go check the IAM role, you will see the “jdc-tf-test-iam-policy” has indeed been removed from the role.

Run terraform apply again - and it’ll add it back for you!

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_iam_role_policy_attachment.permissions_policy_role_attachment will be created
  + resource "aws_iam_role_policy_attachment" "permissions_policy_role_attachment" {
      + id         = (known after apply)
      + policy_arn = "arn:aws:iam::XXXXXXXXXXXX:policy/jdc-tf-test-iam-policy"
      + role       = "jdc-tf-test-iam-role"
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_iam_role_policy_attachment.permissions_policy_role_attachment: Creating...
aws_iam_role_policy_attachment.permissions_policy_role_attachment: Creation complete after 1s [id=jdc-tf-test-iam-role-XXXXXXXXXXXXXXXXXXXXXXXXXX]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

This will go on forever - every time you apply it will then remove the policy attachment, then add it back the next apply, etc.

I was able to find a way to get my code working, but it involved no longer using the aws_iam_role_policy_attachment resource in my terraform. This code does functionally the same deployment.

Updated main.tf

resource "aws_iam_policy" "logging_iam_policy" {
  name        = "jdc-tf-test-iam-policy"

  policy = jsonencode({
    Statement = [{
      Action = ["logs:PutLogEvents"]
      Effect   = "Allow"
      Resource = "*"
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role" "iam_role_test" {
  name = "jdc-tf-test-iam-role"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "eks-fargate-pods.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })

  managed_policy_arns = ["arn:aws:iam::aws:policy/AmazonEKSFargatePodExecutionRolePolicy", aws_iam_policy.logging_iam_policy.arn ]

  depends_on = [
    aws_iam_policy.logging_iam_policy
  ]
}

Now when I apply it, it applies as expected. Immediately running terraform apply again now works as I would expect it to:

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Not sure if this is a terraform issue or AWS issue but something doesn’t seem right here.

Hello - The issue here is that use of the managed_policy_arns argument will cause the aws_iam_role to attempt exclusive management of ALL identity policies attached to the role. On each apply the aws_iam_role will attach any of the policies in the managed_policy_arns array which are not currently attached, and detach anything else. Here is a section from the aws_iam_role documentation describing this situation:

If you use this resource’s managed_policy_arns argument or inline_policy configuration blocks, this resource will take over exclusive management of the role’s respective policy types (e.g., both policy types if both arguments are used). These arguments are incompatible with other ways of managing a role’s policies, such as aws_iam_policy_attachment, aws_iam_role_policy_attachment, and aws_iam_role_policy. If you attempt to manage a role’s policies by multiple means, you will get resource cycling and/or errors.

In this case you can either:

  • Remove the managed_policy_arns argument and instead use distinct aws_iam_role_policy_attachment resources for each required policy.
  • Remove the aws_iam_role_policy_attachment resource, and instead pass the ARN of the policy created with Terraform into the managed_policy_arns array.

The former option is (in my opinion) more readable and better represents the underlying AWS APIs. The latter option is great for situations in which you need exclusive management of policy attachments to ensure only the expected permissions are attached, and nothing else.

Hope this helps to clarify the provider behavior here!

2 Likes

Thank you for the reply.

I find this extremely frustrating as I just rolled out a bunch of lambda’s and terraform is causing issues with the roles. I appreciate the help though.

I really wish that terraform’s documentation was a little clearer with the affects of using something like a managed policy arn during the apply process.

This has been really helpful.

Wondering why terraform is still allowing this “managed_policy_arns” argument, when it is causing such havoc.