Enumerate zip file content

obones · December 6, 2021, 4:55pm

Hello,

Using the hashicorp/dir/template module, I can enumerate the content of a folder and in turn create S3 object to be uploaded:

module "template_files" {
  source = "hashicorp/dir/template"

  base_dir = "${path.root}/doc"
}

resource "aws_s3_bucket_object" "doc_files" {
  for_each = module.template_files.files

  bucket       = aws_s3_bucket.doc.id
  key          = each.key
  content_type = each.value.content_type

  source  = each.value.source_path
  content = each.value.content

  etag = each.value.digests.md5
}

This works just fine with the issue that my CI/CD environment generates a zip file with all the needed content, zip file being placed in a S3 bucket.
Right now, I’m solving this by using a “local exec” block that retrieves the zip file and unzips it in the expected location.
It works, but it does not feel elegant as it forces developers to have the unzip command line util which is not there by default in all operating systems (Windows for instance).

Hence my search for an equivalent of the hashicorp/dir/template that could directly enumerate a zip file content retrieved from an S3 bucket.

apparentlymart · December 6, 2021, 6:20pm

Hi @obones,

I’m not aware of any existing Terraform provider that offers a data source for reading contents from a zip file, so I don’t think there would be any Terraform-Configuration-only solution which meets your requirements today. In principle it would be possible to write a Terraform provider which offers a data source for enumerating the index of files inside a zip file (analogous to the fileset function) and another data source for reading the content of a particular file inside a zip file (analogous to the file function).

I think some missing pieces relative to what hashicorp/dir/template achieves would be:

That module avoids loading potentially-large binary files into memory by returning the path to the file (in source_path) rather than the content (in content), but there is no direct filesystem path to a file embedded inside an archive, so I think there would be no alternative to loading everything into memory in that case, unless aws_s3_bucket_object and all of the comparable resource types in other providers were to add support for reading from inside an archive.
Terraform doesn’t have a built-in function for rendering a template stored as a file inside an archive, so such a provider could only practically support returning static files and not template rendering in the way that hashicorp/dir/template does.

A potential different way to address this would be to decide that architecturally it’s only the syncing into S3 that’s Terraform’s job, and that the step of obtaining a zip file and unzipping it is a separate step that happens beforehand. In that case, perhaps your Terraform configuration would have an input variable which takes the path where the zip file was already extracted, and then passes that through to hashicorp/dir/template.

You could then potentially use some scripting around this process:

Retrieve the zip file from the CI system
Unzip the zip file to some temporary directory
Run terraform apply -var="source_dir=..." where ... is the temporary directory where you unzipped the file.

This moves the unzipping problem outside of Terraform itself and thus allows you some more flexibility in how to get it done. For example, perhaps whatever scripting you build around this would use a different strategy on Windows than it does on a Linux system.

obones · December 7, 2021, 9:48am

Well, in my own case those are not an issue because they all are very small files, and I don’t use the template part of hashicorp/dir/template.
But I see how that could become a problem in the general case.

Yes, that’s a possibility but I like it when most if not all things are done in once place instead of being scattered across multiple systems.
Right now, I have this resource:

resource null_resource sources_deployment {
  provisioner "local-exec" {
    command = <<-EOF
      rm -fr ${local.s3_source_sync_dir} &&
      mkdir -p ${local.s3_source_sync_dir} &&
      aws s3 cp s3://${local.s3_source_bucket}/${local.s3_source_key} ${local.s3_source_sync_dir} &&
      unzip -o -q ${local.s3_source_local_zip} -d ${local.s3_source_sync_dir}/ &&
      rm -f ${local.s3_source_local_zip} &&
      unzip -o -q ${local.doc_zip_filename} -d ${local.doc_dest_dir}/ &&
      rm -f ${local.doc_zip_filename}
    EOF
  }

  triggers = {
    version = aws_s3_bucket_object.main.content
  }

  depends_on = [ aws_s3_bucket.source, aws_s3_bucket.doc ]
}

It works, but as already noted, it relies on unzip being available.

Well, I guess I’ll have to live with it for the time being, I’ll create a feature request and see how it evolves.

Edit: Introduce a resource that enumerates files inside a zip archive · Issue #30098 · hashicorp/terraform · GitHub

Topic		Replies	Views
Enumerate files once a zip has been unzipped locally Terraform	0	863	January 7, 2022
Continuous diff in aws_s3_bucket_object content AWS	9	7483	August 6, 2019
Lambda source_code_hash Terraform	0	857	October 7, 2021
Loop over files before uploading to S3 Terraform	2	4191	June 9, 2020
Archive provider Terraform Providers	3	289	May 17, 2024

Enumerate zip file content

Related topics