Continuous diff in aws_s3_bucket_object content

Hello,
I either miss understand concept and implementation but I think this issue is not closed: https://github.com/hashicorp/terraform/issues/15594

I have to upload ZIP file and track change using eTag, problem is that eTag every time is regenerated though I do not change ZIP file.

I can not use content_base64 as my file is more than 5MB and with base64 it is 7.5MB.

What is ultimate solution to upload ZIP files that can not be uploaded via content_base64 parameter and at same time stop eTag regeneration?

Thank you.

Hi @vasilij-icabbi,

base64 is the only supported way to represent raw binary data in-memory in Terraform. A better answer for larger files is to avoid loading them into memory at all and to instead ask the provider to read directly from disk, using the source argument.

Terraform has an archive_file data source that was added as a temporary workaround when AWS Lambda was first released because there was otherwise no way to send values from Terraform into a Lambda function. However, that is no longer needed because AWS Lambda functions support environment variables.

Therefore archive_file should no longer be used, and you should instead construct your zip file outside of Terraform and pass it in. Terraform is not a build tool, so we strongly recommend separating the build and deploy steps in your system so that some other more appropriate tool is responsible for constructing the .zip file and Terraform is responsible only for uploading it.

Hello @apparentlymart,
Thank you for your reply. I do not use Terraform to build archive file. I use terraform to only upload archive file to S3, the zip file itself is build outside of terraform.
So I have something like this:

resource "aws_s3_bucket_object" "example" {

  bucket = "some_bucket"
  source = "function.zip"

  acl = "private"

  etag = filemd5("function.zip")

}

Problem is, that with every run eTag is gets regenerated, even though I do not create a new ZIP file

The etag for an S3 bucket object is, unfortunately, not always just an MD5 hash of the file contents. In certain cases it takes on other forms, and it may not be possible to provide a suitable etag value in Terraform in some of those cases. For example, I believe having encryption enabled for the S3 bucket is one situation where the etag is different than the MD5 hash.

When you see the proposed change in the Terraform plan output, does it look like it’s just a change between two MD5 hashes, or does it seem like the old value is in some other syntax entirely? If the “old” value isn’t an MD5 hash then you are in a situation where etag is something different.

If it does seem like a changed MD5 hash, I’d suggest verifying that the new hash Terraform is proposing agrees with the output of the command md5sum function.zip at a shell prompt, assuming you have the md5sum tool installed.

Ok, I found a cause, my s3 bucket uses SSE-KMS encryption, so it means I get etag of encrypted object.

So question is now, if I have encrypted bucket, how can I use aws_s3_bucket_object and trigger updates? As I understand etag is only parameter that triggers upload?

Yes, unfortunately in the remote encryption mode the etag is not useful as an update trigger because your Terraform configuration does not have access to the encryption configuration and thus cannot predict what the ETag will be after the object has been encrypted.

In that case, you’ll need to arrange for some other argument to change so that the AWS provider can see that an update is needed. For example, perhaps you could make the system that is generating function.zip rename that file to include a hash of the contents as part of the name, which will then cause the source argument to be different whenever the file changes and allow the provider to see that it ought to update the remote object.

1 Like

Yes, that an idea. Thank you.

@apparentlymart do not want to open new topic, will leave it here, now I have that kms_key_id causes continuous diff. As per documentation I provide ARN of key, but state looks like always saves ID of key, so I constantly get this diff:

kms_key_id = "arn:aws:kms:eu-west-1:0000:key/81c18d95-a078-4c92-a748-3d4844320224" -> "arn:aws:kms:eu-west-1:0000:alias/backend-lambda-packages-s3"

I think problem is again that bucket is encrypted and might be AWS assigns encryption key ID rather than ARN and we getting collision.

I guess temporary workaround/solution is to have encryption on object (as it does both in transit and at rest encryption) level and disable bucket encryption.

Hi @vasilij-icabbi!

Indeed, it seems like the provider is accepting an alias ARN but then transforming it into a direct id ARN, and thus the configuration can never converge with the remote object.

With the current implementation of the provider, I think the only answer would be to populate kms_key_id using the id yourself, rather than using the alias. If the key id isn’t already known in your configuration some other way, then you could perhaps use the aws_kms_key data source to translate it first:

data "aws_kms_key" "foo" {
  key_id = "arn:aws:kms:eu-west-1:0000:alias/backend-lambda-packages-s3"
}

I think (though I’m not totally sure) that then you could use data.aws_kms_key.foo.arn to get the canonical id-based ARN, rather than the alias ARN.

This kms_key_id permanent diff seems like a provider bug though, so you may wish to see if there’s already an issue open for it in the AWS provider repository. If a provider accepts multiple different ways to specify the same thing then usually the provider will implement some logic to consider them to be equivalent on future runs, so that it’ll converge on a stable result whichever way you specify it.

If you do find or open an existing GitHub issue for this, please leave a note here with a link to it so that other folks who might find this thread in future can easily find it to check on the status of it.

@apparentlymart Thank you for your reply, I could not find any mention on this issue, created ticket:

1 Like