Update state files with wrong information after failure

Terraform updates state files after “terraform apply” fails. Such update causes state files to be out-of-sync and more subsequent problems.

Steps to reproduce the issue:

  1. Create a simple AWS lambda (aws_lambda_function) without vpc_config. (S3 is the backend.)
  2. “Terraform apply”
  3. Ensure the corresponding role does not have permissions for ec2:DescribeNetworkInterfaces
  4. Update the lambda with vpc_config and new source code (zip file).
  5. “Terraform apply” again.

The last step will fail as expected. However, Terraform still updates the corresponding state file with the new source code hash. This causes many big problems. For example, a subsequent “terraform apply” will “succeed” but not update any after fixing the role’s permission issue because there are no source code changes.

Here is a simplified log.

terraform apply -var-file=tfparams.tfvars -no-color -input=false -auto-approve=true

............some process............
............some process............
............some process............
............some process............
............some process............


Terraform will perform the following actions:

  # aws_lambda_function.lambda will be updated in-place
  ~ resource "aws_lambda_function" "lambda" {
        id                             = "sync-issue-03222024"
      ~ source_code_hash               = "P8vi6asyZdCdBAWaatT6sGffkM1lHS0Z2UkSdwgpMNw=" -> "NzX5EyD71u1Ej8H0Yp+y8JXHENhp5HvJ/k9WzwTYl2s="
      ~ version                        = "1" -> (known after apply)
        # (18 unchanged attributes hidden)

      + vpc_config {
          + ipv6_allowed_for_dual_stack = false
          + security_group_ids          = [
              + ".....some ids......",
            ]
          + subnet_ids                  = [
              + ".....some ids......",
              + ".....some ids......",
              + ".....some ids......",
            ]
        }

        # (3 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

............some process............
............some process............
............some process............


2024-03-22T14:43:28.739-0700 [WARN]  provider.terraform-provider-aws_v5.35.0_x5: [WARN] WaitForState timeout after 5m0s
2024-03-22T14:43:28.739-0700 [WARN]  provider.terraform-provider-aws_v5.35.0_x5: [WARN] WaitForState starting 30s refresh grace period
2024-03-22T14:43:28.740-0700 [ERROR] provider.terraform-provider-aws_v5.35.0_x5: Response contains error diagnostic: @caller=github.com/hashicorp/terraform-plugin-go@v0.20.0/tfprotov5/internal/diag/diagnostics.go:62 diagnostic_severity=ERROR diagnostic_summary="updating Lambda Function (sync-issue-03222024) configuration: operation error Lambda: UpdateFunctionConfiguration, https response error StatusCode: 400, RequestID: d5e2884b-c808-4a2f-b772-a9564d1e6ac6, InvalidParameterValueException: The provided execution role does not have permissions to call DescribeNetworkInterfaces on EC2" @module=sdk.proto diagnostic_detail= tf_proto_version=5.4 tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=a5eed880-c5ee-32ec-ae47-c6921c981270 tf_resource_type=aws_lambda_function tf_rpc=ApplyResourceChange timestamp=2024-03-22T14:43:28.740-0700
2024-03-22T14:43:28.742-0700 [DEBUG] State storage *remote.State declined to persist a state snapshot
2024-03-22T14:43:28.742-0700 [ERROR] vertex "aws_lambda_function.lambda" error: updating Lambda Function (sync-issue-03222024) configuration: operation error Lambda: UpdateFunctionConfiguration, https response error StatusCode: 400, RequestID: d5e2884b-c808-4a2f-b772-a9564d1e6ac6, InvalidParameterValueException: The provided execution role does not have permissions to call DescribeNetworkInterfaces on EC2
2024-03-22T14:43:28.742-0700 [DEBUG] states/remote: state read serial is: 7; serial is: 7
2024-03-22T14:43:28.742-0700 [DEBUG] states/remote: state read lineage is: 9d16b7ea-b200-8c79-7131-17ab3d555dec; lineage is: 9d16b7ea-b200-8c79-7131-17ab3d555dec
2024-03-22T14:43:28.743-0700 [DEBUG] Uploading remote state to S3: {
}
2024-03-22T14:43:28.743-0700 [DEBUG] [aws-sdk-go] DEBUG: Request s3/PutObject Details:
---[ REQUEST POST-SIGN ]-----------------------------
PUT ............some message............
............some message............
............some message............
............some message............
............some message............
{
  "version": 4,
  "terraform_version": "1.5.5",
  "serial": 8,
  "lineage": "9d16b7ea-b200-8c79-7131-17ab3d555dec",
  "outputs": {
    "lambda_function_source_code_hash": {
      "value": "NzX5EyD71u1Ej8H0Yp+y8JXHENhp5HvJ/k9WzwTYl2s=",
      "type": "string"
    },
    "lambda_function_version": {
      "value": "1",
      "type": "string"
    }
  },
  "resources": [
    {
      "mode": "managed",
      "type": "aws_lambda_function",
      "name": "lambda",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "architectures": [
              "x86_64"
            ],
            "description": "Simulate the problem of state files being out of sync",
            "memory_size": 256,
            "source_code_hash": "NzX5EyD71u1Ej8H0Yp+y8JXHENhp5HvJ/k9WzwTYl2s=",
            "source_code_size": 1168,
            "version": "1",
            "vpc_config": [
              {
                "ipv6_allowed_for_dual_stack": false,
                "security_group_ids": [
                  ".....some ids......"
                ],
                "subnet_ids": [
                  ".....some ids......",
                  ".....some ids......",
                  ".....some ids......"
                ],
                "vpc_id": ""
              }
            ]
          },
        }
      ]
    }
  ],
  "check_results": null
}

Multiple of the following combo produce the same issue.

  • Terraform cli version: 1.5.5, and 1.6.6
  • hashicorp/aws version: 5.40.0, and 4.65.0

S3 is the backend