How to handle resources deleted outside of TFC

I am trying to create a policy to enforce all S3 buckets to be private in my org. The policy seems to work well when TFC fully knows about the S3 bucket. I have a situation where the bucket was deleted outside of TFC. However, the Sentinel mocks show “no-op” as the action on the resource. Is this a bug? How can I handle this scenario, as we most likely will encounter it in the future?

The TFC workspace is using 0.15.5.

For reference, here is the policy:

# This policy uses the Sentinel tfplan/v2 import to require that all S3 buckets do not have a public ACL

# Import common-functions/tfplan-functions/tfplan-functions.sentinel
# with alias "plan"
import "tfplan-functions" as plan

# Get all S3 buckets
allS3Buckets = plan.find_resources("aws_s3_bucket")

# Filter S3 buckets that have an ACL not set to private
# Warnings will be printed for all violations since the last parameter is true
checkForPrivateAcl = plan.filter_attribute_is_not_value(allS3Buckets, "acl", "private", true)

# Main rule
main = rule {
    length(checkForPrivateAcl["resources"]) is 0
}

Here is a snippet from the TFC plan, which shows up under the section "Terraform detected the following changes made outside of Terraform since the last “terraform apply”:

# module.redacted has been deleted
  - resource "aws_s3_bucket" "redacted" {
      - acl                         = "private" -> null
      - arn                         = "redacted" -> null
      - bucket                      = "redacted" -> null
      - bucket_domain_name          = "redacted" -> null
      - bucket_regional_domain_name = "redacted" -> null
      - force_destroy               = false -> null
      - hosted_zone_id              = "redacted" -> null
      - id                          = "redacted" -> null
      - region                      = "redacted" -> null
      - request_payer               = "BucketOwner" -> null
      - tags                        = {redacted} -> null
      - tags_all                    = {redacted} -> null

      - server_side_encryption_configuration {
          - rule {
              - bucket_key_enabled = false -> null

              - apply_server_side_encryption_by_default {
                  - kms_master_key_id = "redacted" -> null
                  - sse_algorithm     = "aws:kms" -> null
                }
            }
        }

      - versioning {
          - enabled    = false -> null
          - mfa_delete = false -> null
        }
    }

Here’s a snippet from the Sentinel mocks:

"module.redacted": {
	"address": "module.redacted",
	"change": {
		"actions": [
			"no-op",
		],
		"after":         null,
		"after_unknown": {},
		"before":        null,
	},
	"deposed":        "",
	"index":          null,
	"mode":           "managed",
	"module_address": "module.redacted",
	"name":           "redacted",
	"provider_name":  "registry.terraform.io/hashicorp/aws",
	"type":           "aws_s3_bucket",
},

And finally, here’s the error I’m getting while running sentinel apply --trace:

module.redacted has acl that is null or undefined. It is supposed to be private

@pshamus the problem you are experiencing is due to the way in which the plan.find_resources function determines which resources should be filtered. Currently, it filters all resources that have an action value of "no-op".

I’m not sure why this is the case, but if I had to hazard a guess it is probably a catch-all so that any resource that has been provisioned in the past that has a known bad configuration will cause a violation :man_shrugging:

The undefined values then break everything else because you are checking the value of an attribute that does not exist.

It may be worth raising an issue on the third-generation repository to see if there are any possible workarounds.

@pshamus and @hcrhall : If the version of the tfplan-functions module being used is based on terraform-guides/tfplan-functions.sentinel at master · hashicorp/terraform-guides · GitHub, then it is correct that the find_resources function does include resources of the given type that have change actions create, update, read, and no-op. I wrote it that way for exactly the reason @hcrhall mentioned which is to flag violations in existing resources that were created before the policy was first applied to the workspace.

I see from the mock that in your case, change.after is null. The evaluate_attribute function called by the filter function is able to convert the undefined data to null so that the filter function and the policy do not give a hard Sentinel error.

I’m a bit confused by your mock having the no-op action instead of the delete action. If the action had been delete, then the find function would have excluded your S3 bucket that was deleted outside of Terraform.

tfplan/v2 - Imports - Sentinel - Terraform Cloud and Terraform Enterprise - Terraform by HashiCorp says, “The action will remove the associated entity, deleting any applicable state and associated real resources or infrastructure.” Given that the S3 bucket was removed outside of Terraform, it seems to me that the plan does have to remove the state associated with it. But perhaps this was already done during the refresh operation done by the plan. That would explain the no-op action.

@hcrhall : Can you investigate whether The Sentinel mock should actually have had the delete action instead of the no-op action and raise this with the Terraform core team if the answer is “yes” since the problem would then be in the output of the terraform show -json command that Sentinel gets the tfplan/v2 data from.

If the Sentinel and Terraform Core teams do feel that no-op is the correct action in this case, then I can investigate what could be done with the common functions to avoid the problem in this scenario. I could obviously remove no-op from the find function, but that would prevent applying policies to existing resources that are not changing. And I would rather not lose that capability. And I don’t think there is any other way for Sentinel to know that the resource was just deleted.

However, I think a solution would be to check whether after is null when the action is no-op and interpret that as indicating that the resource was deleted outside of Terraform.

I went ahead and made the changes I suggested at the end and submitted this PR: improve no-op processing by rberlind · Pull Request #307 · hashicorp/terraform-guides · GitHub

If you want to try the modified find_resources() function yourself, @pshamus , use this:

find_resources = func(type) {
  resources = filter tfplan.resource_changes as address, rc {
  	rc.type is type and
  	rc.mode is "managed" and
  	(rc.change.actions contains "create" or rc.change.actions contains "update" or
     rc.change.actions contains "read" or (rc.change.actions contains "no-op" and
     rc.change.after is not null))
  }

  return resources
}

@hcrhall / @rberlind Thanks for looking into this and putting in a fix. I am very new to Sentinel and the learning curve is pretty steep. It was very helpful to have the detailed information about what’s going on.

I ended up doing a refresh plan to bring the state up-to-date and the policy works great now. Good to know your fix should handle other situations where a refresh plan hasn’t been run yet.

The "no-op" action is expected in this scenario. The delete operation occurred outside of terraform, and therefore the change that will occur is an update of the terraform state to bring it back in line with what has been defined in the configuration.

@pshamus : I’m curious about the exact sequence by which the S3 bucket was deleted. You said it was deleted outside of Terraform. But I’m wondering whether the the Terraform code for the S3 bucket had also been removed from the configuration before you ran the plan you sharted and whether you had run the terraform state rm command against it to remove it from your state before the plan. I plan to do a few experiments to see what the mocks look like in different scenarios so that I can improve my understanding of what happens in them and make sure my common functions handle all of them well.

Also, I was just able to merge my PR after getting approval for it.

also, it seems to me that if you did not remove the S3 bucket from the Terraform configuration, then the Terraform plan should have been trying to recreate the S3 bucket and the operation should have been create