The problem is, with this cache setting, terraform itself has no problem to use the local provider binaries; but terragrunt needs the lock file .terraform.lock.hcl be available and up to date.
I’m hesitate to check into git repo the hundreds of .terraform.lock.hcl files along with terragrunt.hcl files. So, I tried to use the terragrunt generate block to create the lock file automatically.
The lock file is created as expected with the codes below:
command 'terragrunt init` does create the .terraform.lock.hcl without issue. The problem is, after generate the lock file, terragrunt still download provider binaries (aws, kubernetes, …) – which by all ways to avoid.
The confusion is: why terragrunt ignores the lock file created? Thanks,
I put the following in Terragrunt 's terraform block:
terraform {
…
before_hook “create_lock_file” {
commands = [
“init-from-module”,
“init”,
“plan”,
“apply”,
“destroy”,
“output”,
“state”,
“console”,
“import”
]
execute = [ “cp”, “${get_repo_root()}/terragrunt/dot_terraform_lock.hcl”, “${get_original_terragrunt_dir()}/.terraform.lock.hcl”]
}
}
And this time it works, I can see that lock file .terraform.lock.hcl did create, and provider binaries are copied from terraform provider cache directory on local disk, no download operations to internet.
I think you’re going to need a Terragrunt expert to get to the bottom of this, and I’m not. Though it might simply be due to the order in which Terragrunt processes different parts of the configuration.
I’ll just point out what I can - starting from the top…
I’m pretty sure nothing uses PLUGIN_CACHE_DIR. I checked in the Terraform source code.
/usr/local/terraform_plugins is a weird directory to use as a plugin cache dir, because a plugin cache dir is expected to be writeable, and /usr/local is usually writeable only by root.
A quick search through Terragrunt GitHub issues reveals multiple people attempting to use TF_PLUGIN_CACHE_DIR, and getting errors because Terraform doesn’t support multiple instances of terraform trying to write to the same plugin cache dir at the same time.
By manually copying a .terraform.lock.hcl file into place, you’re going to be fighting with Terraform’s built-in behaviour of automatically updating it. Be aware you probably will find it irritatingly messy to update plugin versions in the future, if you go with this solution.
The logic behind using a cache under /usr/local/terraform_plugins, and create the .terraform.lock.hcl files manually, is that this small environment runs in a place with very low internet bandwidth.
the low bandwidth means it’s impossible to download provider binaries without timeouts/rejections. Although the lightweight versions GET requests managed to return responses on time.
What’s probably the recommended working proxy/mirror solutions for slow internet usage cases, Thanks,
The entries in the plugin cache directory are eligible for use only if they match the checksums recorded in the dependency lock file. If your dependency lock file has mismatching checksums then terraform init will assume that the cache directory has become corrupted somehow and so it will “repair” it by re-downloading the package from its origin registry and replacing the cache entry with it.
Of course that alone doesn’t explain why your system didn’t behave as you expected, but it does suggest something to check: does the synthetic .terraform.lock.hcl file you are generating contain full and complete entries for all of the providers you are using, with suitable h1:-prefixed checksums? (The zh: variant of checksum is not sufficient for this purpose because that can only verify the original release .zip files, not the extracted form used in the cache directory.)
Generating these lock files outside of Terraform is not an intended way to use Terraform, so there isn’t any built-in way to handle this case. I would typically recommend checking the lock files into your version control as Terraform suggests, since then you will be using these Terraform features in the way they are designed to be used, but if you do want to use it in this unusual way then I would suggest first getting it working in the “normal” way with a single test configuration and then adapting that working simple case into the more complicated design with code generation.