Using a role for backend so that all backend states are in a single bucket

I have 8 AWS accounts that I am moving to be terraform’d. A slow process of importing the existing resource into the state file, completing the definition until the plan is empty. Long winded, but a process that is working.

One of the recommend practises is to keep the state file out of the repo, so I am holding the state in S3 (with lock in DynamoDB).

Locally, with my ~/.aws/credentials file setup for all accounts, I can run terraform locally and all the states go to the bucket. But this is because I am using profile="management" in my backend.tf file.

I want to move the terraforming to our pipeline, so having multiple shared credentials floating around isn’t great. We can supply AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION as environment variables in the pipeline. For the management account, this is working fine, so we now have pipelines doing the deployment and we’re happy.

We can’t use multiple envvars as Terraform doesn’t allow any variables in the backend definition.

So I’m trying to work out how to “share” access to the state file and lock.

I think (and I’m probably wrong) but the idea of a ‘terraforming role’ that exists in the management account that allows nominated users to be able to assume that role and interact with the state would seem to be a sensible way to go. But I’m not getting it.

So can you please review what I’ve got, point me in the right direction or, pretty please, help me fix this.

So, the management state.tf file is (the S3 bucket and DynamoDB table resources snipped) …

resource "aws_s3_bucket" "backend_remote" {
...
}

resource "aws_dynamodb_table" "backend_locks" {
...
}

data "aws_iam_policy_document" "terraforming_policy_document" {
  statement {
    sid       = "TerraformingS3ListBucket"
    effect    = "Allow"
    actions   = [
      "s3:ListBucket",
    ]
    resources = [
      aws_s3_bucket.backend_remote.arn,
    ]
  }
  statement {
    sid       = "TerraformingS3AccessObjects"
    effect    = "Allow"
    actions   = [
      "s3:GetObject",
      "s3:PutObject",
    ]
    resources = [
      "${aws_s3_bucket.backend_remote.arn}/*",
    ]
  }
  statement {
    sid       = "TerraformingHandleLocks"
    effect    = "Allow"
    actions   = [
      "dynamodb:GetItem",
      "dynamodb:PutItem",
      "dynamodb:DeleteItem",
    ]
    resources = [
      aws_dynamodb_table.backend_locks.arn,
    ]
  }
}

data "aws_iam_policy_document" "terraforming_assume_role_policy" {
  statement {
    sid     = "TerraformingAssumeRolePolicy"
    effect  = "Allow"
    actions = [
      "sts:AssumeRole",
    ]
    principals {
      type        = "AWS"
      identifiers = [
        "arn:aws:iam::4[SNIPPED]:root"
      ]
    }
  }
}

resource "aws_iam_policy" "terraforming_policy" {
  description = "Policy to allow terraform state files to be stored in S3 with locking managed in DynamoDB"
  name        = "TerraformingPolicy"
  policy      = data.aws_iam_policy_document.terraforming_policy_document.json
}

resource "aws_iam_role" "terraforming" {
  assume_role_policy = data.aws_iam_policy_document.terraforming_assume_role_policy.json
  description        = "Role to allow terraform state files to be stored in S3 with locking managed in DynamoDB"
  name               = "Terraforming"
  path               = "/"
}

resource "aws_iam_role_policy_attachment" "terraforming_attachment_policy" {
  policy_arn = aws_iam_policy.terraforming_policy.arn
  role       = aws_iam_role.terraforming.name
}

With this, my management AWS console shows the role and it all looks good. It shows a policy which the policy validation is happy with. It shows the trusted account.

It all looks OK.

So, now to the devops account (runs our pipelines and other non production tasks).

The working backend.tf (using shared credentials in ~/.aws/credentials) looks like this …

terraform {
  backend "s3" {
    bucket         = "management-state-bucket"
    acl            = "private"
    encrypt        = true
    region         = "eu-west-1"
    dynamodb_table = "terraform_locks"
    key            = "devops.tfstate"
    profile        = "management"
}

So, in now trying to get this to work like it would be in the pipeline, I hide my ~/.aws/credentials entries for the management and devops account, and use envvars AWS_ACCESS_KEY=AKIAIN... AWS_SECRET_ACCESS_KEY=... AWS_DEFAULT_REGION=eu-west-1 terraform ...

If I also add TF_LOG=trace I can see that the right credential provider is in play : 2020/04/07 10:40:07 [INFO] AWS Auth provider used: "EnvProvider"

I’ve tried using the management credentials in the devops.tf …

   access_key     = "AKIAJ5..."
   secret_key     = "..."

And that worked, so now to get rid of the credentials.

I had thought that I could just add the role_arn entry for the terraforming role, but that’s not working.

    role_arn       = "arn:aws:iam::5[SNIPPED]:role/Terraforming"
2020/04/07 11:31:20 [INFO] AWS Auth provider used: "EnvProvider"
2020/04/07 11:31:20 [INFO] Attempting to AssumeRole arn:aws:iam::5[SNIPPED]:role/Terraforming (SessionName: "", ExternalId: "", Policy: "")

Error: The role "arn:aws:iam::5[SNIPPED]:role/Terraforming" cannot be assumed.

  There are a number of possible causes of this - the most common are:
    * The credentials used in order to assume the role are invalid
    * The credentials do not have appropriate permission to assume the role
    * The role ARN is not valid

The ARN for the role is the one displayed in the management console for the role.
The role says the devops account (4xxxxx) is allowed.
So I’m stumped.

Any help would be appreciated.

I’ve also asked the same question on Stack Overflow if that is a better place to get an answer : https://stackoverflow.com/questions/61078028/attempting-to-centralise-terraform-states-and-getting-very-confused-with-assume