Provider lock hash list value changed

I have a terraform config directory and I use atlantis to run terraform commands. Today, someone submitted a PR to add some new config to the directory. The terraform init came back with:

Initializing the backend...
Initializing modules...

Initializing provider plugins...
- is built in to Terraform
- Reusing previous version of confluentinc/confluent from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Using confluentinc/confluent v1.37.0 from the shared cache directory
- Using hashicorp/aws v4.61.0 from the shared cache directory

Error: Failed to install provider from shared cache

Error while importing confluentinc/confluent v1.37.0 from the shared cache
directory: the provider cache at .terraform/providers has a copy of 1.37.0 that doesn't match any of
the checksums recorded in the dependency lock file.

Error: Failed to install provider from shared cache

Error while importing hashicorp/aws v4.61.0 from the shared cache directory:
the provider cache at .terraform/providers has a copy of 4.61.0 that doesn't match any of the
checksums recorded in the dependency lock file.

This directory planned and applied with this lockfile before so this seems strange. However, I tried something by locking down the provider versions with required_providers to exactly what is in the lockfile and ran an init -upgrade. The lockfile came back with a diff. Both the confluent and aws provider changed the h1 has at the top of the hashes list:

 provider "" {
   version     = "1.37.0"
-  constraints = "~> 1.4"
+  constraints = "~> 1.4, 1.37.0"
   hashes = [
-    "h1:XWZM/a8WQl7YvzlI2l1STgtJMTCJNZoBkba8VzPRMsE=",
+    "h1:o+8M0B5QS24xGUhJYaz/9PD5pKbFbnKgpIX+ea+VqgU=",
@@ -25,9 +25,9 @@ provider "" {
 provider "" {
   version     = "4.61.0"
-  constraints = ">= 3.0.0, >= 3.40.0, >= 3.50.0, >= 3.60.0, >= 4.0.0, ~> 4.0, >= 4.9.0, ~> 4.9, ~> 4.52"
+  constraints = ">= 3.0.0, >= 3.40.0, >= 3.50.0, >= 4.0.0, ~> 4.0, >= 4.9.0, ~> 4.9, ~> 4.52, 4.61.0"
   hashes = [
-    "h1:qyBawxoNN6EpiiX5h5ZG5P2dHsBeA5Z67xESl2c1HRk=",
+    "h1:mJSchOA6VkYwEsi+tuspadRmyyE+FGZGYJFUt5kHV+M=",

It would seem that this should not be possible unless I misunderstand the whole point of this.

I tested a separate PR with the same directory, once without changing the lockfile (same error). Then updated the lockfile as shown in the diff above and the init succeeded.

Hi @grimm26,

You are right that this is odd behavior. If you are running just terraform init (without -upgrade) then Terraform can potentially add new checksums, but should not remove any that are already present.

On the other hand, if you did run with -upgrade then that causes Terraform to ignore the lock file and just take the latest matching release from the installation source. If the selected release happens to exactly match the version you previously selected then an outcome like this would be reasonable if you are using a different operating system or CPU architecture to the person or system that generated the h1: checksum that was removed. Those ones are calculated locally based on the content of the extracted package, so by default Terraform can only include the one for the platform you are currently using when you run terraform init -upgrade.

I think a difference in OS or CPU architecture might also explain the original error you saw. The cache directory can only be verified with h1: checksums because it’s stored in the already-expanded format (instead of as a zip file), so if your lock file only includes a checksum for some other platform then it won’t match on your platform.

The latest version of Terraform (v1.4) has some better treatment of the cache directory where it is no longer a hard error if the cache directory doesn’t match a checksum, and instead Terraform will cross-check with the provider registry to confirm a checksum for your current platform. So if you aren’t already using v1.4 you will hopefully find this works better after upgrading. If you are already using a v1.4.x release then please let me know and I can try to figure out what went wrong here.

I’m running 1.4.4 and did not use the -upgrade flag.

I don’t quite understand this. If someone runs terraform init on a mac and commits that .terraform.lock.hcl file and then I check it out on my linux machine and run terraform init it will fail because there are checksums for darwin in the hashes list? I thought the whole point was to populate the hashes list with all available checksums so that any platform would work.

As it is, I originally committed that lockfile from my linux_amd64 system and we run atlantis on an linux_amd64 container.

The specific problem for the original error message is that your plugin cache directory contains an unpacked copy of the provider, which can only be verified with a h1: checksum. The zh: checksums are of the original .zip file that the provider developer published, but that zip file isn’t available in a local cache directory and so those checksums are not usable when installing from cache.

The behavior I would have expected with Terraform v1.4 is that it would notice that the cache directory does not contain anything that matches any of the checksums in your lock file and so it would contact the registry to find the zip file for that version, verify that against your zh: checksums, and then add a new h1: checksum to the lock file so that Terraform can use your cache directory directly in future, rather than cross-checking with the registry.

I don’t know why Terraform didn’t behave as I described above. If you’d like to open a bug report about it then the Terraform team can investigate further to try to understand what happened here.

I started filling out a bug report and in trying to reproduce the error I found out the cause.

We have TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE set, which causes the error above. If I unset that, terraform will add a new h1for the current platform to the lockfile and proceed.
This is not how I thought that option worked. I assumed it would just let me use what was in the cache.

Hmm indeed, with that option set you’ve effectively told Terraform to use the not-quite-right behavior from v1.3 and earlier, which was fixed in v1.4. That explains why you still saw the old error message despite the fact that you are using v1.4.