Terraform 0.13+ How to ensure consistent provider versions

I’m currently managing many environments with terragrunt. I’ve got a generate block in my top-level terragrunt.hcl that looks like

generate "providers" {
  path      = "tg-gen_providers.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
terraform {
    required_providers {
        aws = "${local.aws_provider_constraint}"
        null = "${local.null_provider_constraint}"
        random = "${local.random_provider_constraint}"
        archive = "${local.archive_provider_constraint}"
        external = "${local.external_provider_constraint}"
    }
}

provider "aws" {
    profile = "${local.aws_profile}"
    region = "${local.aws_region}"
    version = "${local.aws_provider_constraint}"
}
EOF
}

Elided is code that sets those locals that can have account based defaults, region based overrides, and environment based overrides. This has enabled having a consistent version constraint for an entire environment and in many cases is used to pin to a specific version for the environment.

Due to the way that terragrunt merges terraform code, the terraform 0.13 change to require a single required_providers block instead of merging constraints, upgrading to 0.13 is not possible when an upstream module already already contains a required_providers block.

It seems like the only way to accomplish this version management with terraform >=0.13 is to make use of https://github.com/hashicorp/terraform/tree/master/tools/terraform-bundle. Currently, terragrunt/terraform is being applied from local laptops so it was very convenient to rely on the merged constraints to automatically downloads plugins but inform if our constraint didn’t overlap with an upstream module we were upgrading. Adding terraform-bundle into the local workflow seems possible but feels like a worse user experience than merged constraints and the local plugin cache.

It’s very possible this is just a corner we painted ourselves into, but I thought it would be discuss as others may have similar usability issues, and there may be a better solution than terraform-bundle I’m not aware of.

Another option would be to make a small module that just maps inputs to a module resource, but that would require duplicating all vars / outputs that are used from the upstream module.

I think if you arrange for your generated filename to have the suffix _override.tf then Terraform will consider it to be an Override File and thus allow it to reset specific provider requirements as described in the merging rules.

Note that the version argument inside provider blocks is redundant with the required_provider setting and has been deprecated since Terraform 0.12. You should be able to drop that from your configuration with no change in behavior, and thus avoid a deprecation warning in newer Terraform versions.

Thanks for the note on the deprecation, I meant to remove that before posting but I forgot :slight_smile:

Also thanks for the nudge towards the override file documentation. I was under the mistaken impression the merging logic worked on a filename matching basis, rather than on a top-level-block basis (e.g. $file-to-override_override.tf). I haven’t tried it just yet but this looks like the correct solution.

That said, if there is a breaking change that doesn’t manifest in the plan phase, this could also override provider constraints from upstream modules exposing that breaking change even though the maintainer did the right thing.

I think the lock pinning in TF14 solves a chunk of why we started doing this in the first place, and terraform-bundle is also an option. This should unblock TF13 for now while we evaluate a longer-term solution.

Thanks again

Indeed, my hope with the new 0.14 dependency locking mechanism is that we can move away from overloading the root module’s required_providers to represent both “what range of versions to we consider this configuration to be compatible with” and “what exact version are we currently using”.

I will say that locking a single set of dependencies across many configurations was not an explicit design goal, but I assume you could in principle use this same Terragrunt feature to generate a dependency lock file before it runs terraform init, instead of having terraform init itself create the file separately for each directory.

Potentially, though it would require some tool that understands how to generate that lock file (e.g. something like terraform-bundle but it only generates the lock instead of vendoring binaries). I’m not sure if the TF14 plans include something that would do that.

I agree with your hope that having a lock file goes a long way towards eliminating the “I need to use an explicit version across all modules in an environment” paranoia. I think the next chunk is deciding how much we want to be paranoid about upper bounding provider versions in our internal modules as that currently sounds like a lot of toil compared to managing a single version.

Of course specific situations may lead to different approaches, but I think my general advice would be to use >= constraints everywhere once you’re using 0.14-style dependency locking, and let the lock file serve as the “memory” for an exact version currently in use.

I would not proactively add upper limits (either ~> or <) unless you specifically know that a configuration is not compatible with later versions for some reason, such as if you’re intentionally maintaining a maintenance branch of a module that targets an earlier Terraform version.

We typically can’t predict what features will be available in future major versions of a provider, so I think instead we should typically wait until we’re ready to do some testing and then run terraform init -upgrade to get the newest version, test how it behaves using terraform plan, and then commit the lock file if the new version is working as expected.

If you find that a new provider version includes a breaking change and you aren’t yet ready to change your configuration to be compatible with it, that could be a good time to document that with a temporary < version constraint in the affected module, along with a source code comment explaining what you learned, as a note to someone else who might try the upgrade again in the future.

The key in all of this is that upgrading providers will now always be an intentional change made explicitly, rather than something that can happen unexpectedly as part of normal Terraform usage. I expect different teams will build different workflow approaches around that new behavior, but I’m hopeful that manually managing upper bounds on version constraints across many configurations/modules is something that at least most teams can leave behind once they are using Terraform 0.14.