How to parse and download module source formats?

Is there a tool for downloading module sources per the syntaxes listed here?

My use case: I have a complex terragrunt structure, lots of hcl files which reference lots of modules. I am auditing them to:

  1. Pull out the various module source lists (and summarizing them by depth of directory tree)
  2. Download the specific modules listed (after reducing them to one copy per module/version)

I can then audit modules used, licenses, versions of terraform required for specific modules, etc.

It is fairly easy for me to use a simple find and awk statement to get the source, but actually parsing the unique formats is messy, and redundant; some code in terraform already does it.

I thought about a tf init, but that won’t work either. There could be conflicting versions, and I don’t have anywhere with valid tf that the modules all are listed in one place.

I don’t mind doing this in go, rather than a simple script, if there are good libs for it.

Thanks in advance

I managed to get somewhere by using some of the OSS terraform code and gohcl library, at least to parse the source module ref. Almost all of this is inside internal/ in hashicorp/terraform, unfortunately. so no library surface.

I am at a point where:

  1. I do not quite know how to use that to download the desired module.
  2. Determining the versions of Terraform for which the module would be compatible appears to be not quite clear. Some have a version constraint string in the terraform { } block, but even that is not enough.

How do I download the module, and how do I check compatible terraform versions?

Hi @deitch,

Due to the history and compatibility constraints present in Terraform, CLI itself is the reference implementation for how it will download modules. For non-registry modules however, the special source formats are for the most part handled by the github.com/hashicorp/go-getter package.

The module installation behavior in Terraform CLI has a long legacy at this point and isn’t really well defined aside from its current implementation, and so it isn’t really a suitable interface for third-party integration.

Depending on what your goal is, you might find the newer sourcebundle package interesting. This came as part of laying the foundations for the forthcoming Terraform Stacks features, where we are taking the opportunity to slightly “reboot” some older details of Terraform, including the problem of discovering and fetching remote source packages.

The plan is for stacks to support a subset of the source package types that traditional Terraform CLI offers, and this (currently-experimental) package is an implementation of that subset. Terraform CLI isn’t using this at all yet, but once there is CLI support for stacks then it will.

I strongly doubt your goal has anything to do with Stacks since that feature is not yet ready for broad use, but you might still be able to use this library as a part of what you are doing, as long as your modules use only the subset of source address types that are implemented there, which is currently:

  • Terraform module registry addresses
  • Git repository source addresses
  • HTTPS URLs that refer to gzipped tar archives

Others may follow later, but that’s what it supports at the time I’m writing this message.

Because this library is just for orchestrating the install process, it still needs the caller to provide a concrete implementation of both a Terraform Module Registry Protocol client and a remote package fetching client. Because the addresses accepted by this library are a subset of those that Terraform CLI currently handles via go-getter, you can use go-getter as the implementation of fetching remote packages. There isn’t yet an open source implementation of the module registry protocol implementing the interface in this package, but I expect one will follow at some later point when Terraform CLI is ready to start using this for installing source code for stacks.

This stuff is still very early so there will be some missing bits you will need to plug if you use this today. It should become more complete (by releasing other libraries that connect with this one) as the Terraform Stacks functionality starts to solidify.

Hi @jbardin , thanks for the response :slight_smile:

Due to the history and compatibility constraints present in Terraform, CLI itself is the reference implementation for how it will download modules.

Is there a command that says, “download this module to this dir”? I know that tf init does that under the covers in an opinionated way (from within .tf files, to a specific .terraform dir, etc.), but is there a way to say, “I already extracted all of these source URLs, I want to get them all”?

For non-registry modules however, the special source formats are for the most part handled by the github.com/hashicorp/go-getter package

So are you saying I could use Get() and it would work? Something like:

getter.Get("my/download/path", "git::ssh::git@somegit.com/....")

And it is that simple?

Hi @apparentlymart ; I always appreciate your detailed responses.

I thought I had put it in the original question, but I might have missed some of it.

I am auditing dependencies. That means:

  1. Going through a source tree and finding .hcl files and finding source clauses - fs.WalkDir() gets me the files, gohcl is pretty good for parsing them, now I have all of my source URLs
  2. Downloading all of those - hence this question
  3. Analyzing the downloaded modules

By the end of stage 2, I have a good start. I actually know every module and version on which my configs depend, which is pretty good.

Because this library is just for orchestrating the install process, it still needs the caller to provide a concrete implementation of both a Terraform Module Registry Protocol client and a remote package fetching client

Is there an example of how to use it? It sounds like it handles a lot of what I am building bespoke (and would rather not). As I parse the various hcl (auto-correct keeps changing it to “tcl”; these definitely are not tcl files! :laughing:) files, I am loading them up into a custom struct, then passing them to another process that goes to download them.

Somewhat relatedly, I notice that each download from git protocols clones the entire target repo, then checks out the specific ref. In the case of a large repo, that can be very expensive.

It looks like a regular tf init does something similar in the .terraform/modules/ dir. Is that correct?

Thinking more about what you shared before and what you’ve added here, unfortunately I think the main thing you’re looking for here lives in the gaps that the sourcebundle package delegates to its calling application. Specifically:

  • DependencyFinder delegates the problem of analyzing an already-fetched directory to see what other packages it depends on.

    That is, this package expects its caller to provide logic like “find all of the .tf files, find the module blocks inside them using HCL, and then take the source and version arguments”.

  • PackageFetcher delegates the problem of actually retrieving a remote source package and placing its content into a designated directory on local disk for further analysis.

  • RegistryClient delegates the problem of speaking the Terraform Module Registry Protocol to allow translating a registry-style source address and version constraints into a remote package address that the PackageFetcher can retrieve.

sourcebundle.Builder contains the glue logic to drive the interactions between implementations of those three interfaces, but it doesn’t provide any implementations itself because real implementations of these interfaces tend to require relatively “heavy” dependencies, like HCL itself, go-getter and all of its transitive dependencies, etc.

It sounds like you already wrote something that could be turned into an implementation of DependencyFinder.

A PackageFetcher that directly wraps go-getter’s “getters” (selecting one based on sourceType) ought to be relatively straightforward to build given that sourceaddrs currently implements a subset of the go-getter address syntaxes. (This is also the place where you could, if you wanted to, optimize how exactly to fetch git repositories. Terraform Cloud’s implementation of this interface makes a shallow clone, for example, because we know that in that context there will never be any further Git operations run against the resulting work tree.)

For RegistryClient there isn’t really much reasonable variation in the implementation – it’s basically always going to be wrapping an HTTPS client making calls to two of the operations in the Terraform Module Registry protocol – but so far there isn’t a a ready-to-call example of that in any open source Go library.

I do expect that at some point we’ll have open-source libraries that either directly implement these interfaces or are at least similar enough in their level of abstraction for it to be trivial to write a wrapper implementation, but I also don’t expect that to come in the very near future because we’re currently focused on using this thing as a vehicle for the early private preview of Stacks, and so the flexibility to change it in response to feedback is more important.

If you’re willing to implement all three of these interfaces then sourcebundle could work for you, but with one big caveat: it intentionally supports only a subset of the remote source address types that Terraform CLI supports today. Specifically, it supports Git repositories and HTTPS URLs that refer to gzipped tarballs, as previously mentioned. If you’d like your tool to support other source types then this implementation strategy would not be suitable.

It might grow to support other source types in future, but I don’t expect it to ever reach 100% parity with Terraform CLI because several of its supported source types are essentially technical debt at this point, and the Stacks execution model (which is what this library was written in support of) is providing an opportunity for some carefully-considered breaking changes.

heh, you give my crude work too much credit. But, yes, in a very limited sense, that is what I did.

So there isn’t anything nowadays that is the equivalent of go-getter for registry, i.e. registry-getter?

So there isn’t anything nowadays that is the equivalent of go-getter for registry, i.e. registry-getter?

The code we’re using in the only-currently-existing caller of sourcebundle inside Terraform Cloud is more-or-less a copy of the registry.Client implementation in Terraform CLI, with its API tweaked a little to conform to this interface.

Until now there has been no real need for anything other than Terraform CLI to be a client for this API, and so there was no cause to prioritize a separate library with this functionality in it. Introducing a new library requires considerably more care for the durability of API design than internal package does, and so we typically don’t do it until there’s a strong reason to do it, as there was for e.g. terraform-config-inspect previously.

I expect that once both Terraform Cloud and Terraform CLI will both need an implementation of that interface that will be sufficient motivation to factor it out into a shared spot where both can depend on it, and that shared spot is likely to be open source, but I don’t know when exactly that will happen since our focus is currently on gathering feedback about the Stacks user experience in Terraform Cloud in its private preview phase, and not on integrating it into Terraform CLI yet.

Got it, ok. Thanks for explaining, Martin.

I don’t really know Stacks. I used to have access to TFC back when I consulted to companies that used it. I don’t think any of my current ones have access, so no way to look at it now.

I will start a separate thread for the terraform versions question.

@apparentlymart I just watched the brief video intro to stacks, as well as read the blog post. This looks really interesting. I have always found that tf excels at weaving together dependency and modularity and reproducibility, but hits limits of scalability. So people limit their config sizes, and have to weave things together using their own tooling, or OSS like terragrunt.

That always struck me as a pity. The dependency graph is almost the heart of terraform; to have to leave it at the highest level is a waste.

It sounds like stacks is starting to square that circle? Provide the ability to use terraform’s graphing capabilities (dependency resolution, ordering, etc.) with its native language built for it, but without triggering the scalability issues?

How can I get a preview of it? I am an independent consultant, not an enterprise, so while I have worked at companies with TFE and TFC, I personally do not have access to either.