Best Method for Creating Private Mirror

Hello:

One of the organizations I do work for prefers to install software from private repositories. For open-source projects, this has typically meant:

  1. Identify a public repository to pull contents from
  2. Identify best method to create a local/internal duplicate (mirror) of that public repository
  3. Point internal security-scanners against the local/internal mirror
  4. Allow internal clients to access only the contents that the security-scanners have “blessed”

For other projects, I’ve frequently been able to use (cron-enabled)rclone or rsync job to replicate the upstream project-contents while minimizing the amount of duplicative fetches (i.e., no re-downloading content I already have locally). When I was trying to pull from the download site, I noticed that, if you gave an invalid URL specification, you’d get what looks like an S3 error message:

# 403 Forbidden

* Code: AccessDenied
* Message: Access Denied
* RequestId: <REQIDSTRING>
* HostId: <LONGALPHANUMSTRING>

WHich made me think, “ah ha! I can probably rclone this”. Unfortunately, I could neither figure out the requisite rclone pull-config to do so, nor were my google searches terribly fruitful. Mostly, they pointed me at how to use the network provider rather than how to use rclone (or any tool, really) to set up a mirror.

As a kludge, I wrote a quick-n-dirty scraper BASH script to do things for me, but, it’s sub-ideal (“brittle” and not terribly avoiding of duplication).

So, figured I’d post here to see if anyone has done similar and could give me some tips.

Thanks in advance!

You can install a Repository Manager like Nexus from Sonatype and use that as an internal source for your users applying some procurement rules for approval.

Sonatype moved the functionality to Nexus Firewall - Application Security | Sonatype

Hi @ferricoxide!

If you’re talking about the terraform executable itself then Terraform doesn’t really interact with its own download service at all and so you can mirror it in whatever way you want as long as the terraform excutable ends up installed somewhere where it’ll be executable. With that said, the fact that it’s currently an S3 bucket behind a CDN is an implementation detail and not something I would suggest relying on. No problem with treating it just like any other static website and mirroring it as you wish, though.

Once you have the Terraform CLI executable installed I expect your next area of interest would be provider plugins, which are separate packages that Terraform downloads during terraform init. Those are a trickier proposition because Terraform CLI does directly interact with those servers and so it expects a particular protocol.

If your goal is to have a HTTP server under your control which contains copies of the provider packages then that’s exactly the situation network mirrors are aimed at: you can configure Terraform to skip its usual behavior of contacting the origin registry for each provider and instead try to resolve all provider requirements against a specific network mirror:

# This setting belongs in the CLI Configuration, which is
# separate from the .tf files you use to describe infrastructure
# for a particular module. See the page I linked above for
# more information.
provider_installation {
  network_mirror {
    url = "https://example.com/terraform-providers/"
  }
}

A catch here, though, is that network mirrors talk a different protocol than the origin registries they mirror: the network mirror protocol. This is needed because a provider registry only serves providers for its own hostname, whereas a network mirror can server providers for any hostname; the hostname in that case just creates a separate namespace, rather than being a network location to install providers from as it is by default.

That means you can’t just mirror the original registry exactly, and must instead construct JSON index files so that Terraform can find out which packages the mirror has and where exactly those packages are hosted. The terraform providers mirror command knows how to construct indexes that you can serve over HTTP to implement the protocol, but it’s designed to mirror the providers for one particular configuration, so it might not work so well if you need to create a single mirror with the packages needed across many different Terraform configurations with non-overlapping provider requirements. If terraform providers mirror isn’t sufficient for what you need then you’d need to build something to generate a similar result.

Ok. I was hoping someone knew off the top of their head the appropriate rclone method, but, I guess using wget in spider/mirror mode will suffice.

At any rate, thanks for the detailed reply. It begs the question, “how does one submit an RFE for making setting up of private and/or disconnected mirrors”. Given the strictures around the production-environments I serve (and know seems to be getting more common across industries concerned about data-protection), it would probably help with Terraform’s market-penetration.

-tom