Depends_on in providers

Motivation

So, there are a lot of inherent issues because of no “depends_on” in both “old and new” style module providers. i.e. afaik “new style” is where the module has no provider definitions whatsoever and relies on the “caller” to pass the respective provider, optionally with an alias", and the “old style” is where providers are initialized with the provider block right inside module…

So, there are numerous inherent provisioning issues where we’re forced to wait till Provider will get all the relevant resources to be initialized.

  1. You can’t initialize Kubernetes provider when there’s no Kubernetes yet
  2. You can’t initialize Kubernetes Provider with CR’s when there are no CRD’s yet to validate CR’s against
  3. You can’t initialize Vault or Boundary providers when they haven’t been deployed yet, same goes for Postgres

And so on, and so forth…

The intention to have a “single plan” which essentially prevents implementation of every and all kinds of multi-stage deployments is understandable. Even if it’s the Main Selling point of terraform, - to have a refresh phase and planning phase to get a single atomic Diff.

There are numerous issues in that regard, and numerous terraform wrappers had been developed to overcome this (terragrunt, terraspace etc). It really reminds me of situation with the docker multi-stage builds when numerous companies developed multiple CLI tools to overcome this limitation (like grammarly’s rocker and rocker-compose).

In this regard, having a “Consolidated Plan” for everything Is a Negative Product Value for the Terraform community itself due to impossibility of multistage deployments, on the other hand having an atomic “Consolidated Infrastructure State” is of Positive Product Value. It’s really important to differentiate and to not mix up these two, - there’s just too much confirmation bias present already.

First and foremost, there are couple of important things to notice

  1. Provider should be able to build a plan for resource management, without being fully initialized, - with “known after compute” state for every underlying resource
  2. We assume that uninitialized providers didn’t create any underlying resources and the refresh phase will be deferred until provider will get all the necessary dependencies (modules, variables etc) available
  3. After deferred provider initialization and follow up refresh, if some underlying resource needs any changes, - we’ll have to actualize the plan and request some additional user input for that new “clarified” part of it.

The situation of “known after computing” is tolerable enough for all the existing resources, but it should be also applicable to all the relevant providers, as well. Splitting the plan into multiple stages, or accepting everything by default, should be an explicit user’s choice.

I’d like to contribute some a bit more viable design and have further discussion on this.

Terraform should be able to handle the situation when one provider plan would depend on the output and execution of the other provider plan, and the respective modules.

Having a consolidated terraform state (and cross-provider depends_on) also brings new possibilities - it’s possible to design Kudo-like K8S operators based on consul-terraform-sync, for instance. There’s also a political benefit to it as well, because some cloud providers are refusing to support postmortem k8s clusters with custom operators installed - it’s just too hard to prove that your operator is not the root cause (even if it’s obvious that it’s not).

Complaints

This matter became a Massive Source of Detraction for All the HashiCorp Terraform adopters, and forced other people to Develop New Tools to overcome the existing Terraform design limitations - i.e. both Pulumi and Crossplane are basically parasitic for the current Terraform provider Infrastructure, and any more advanced Terraform setup is basically impossible without Terragrunt. Having depends_on in providers and being able to pass variables will resolve a lot of associated issues with multi-stage and multi-env deployments.

To my subjective opinion, it’s just a result of deficient Developer Advocacy and overall Bad Development Experience for the OSS community, possibly a case of internal Workplace Deviance (Employee Silence or similar). There are couple ways to sort out the Github issues Kitchen Sink, but every single one of them is based on Reaching Out to the Developers, which can be automated, and “being understaffed” is not a valid excuse.

I do get that HashiCorp are deaf to people’s lamenting, and often go against a common sense, or just concise design, due to the “state of business”, - it would be pointless for me to express all the possible associated Workplace Deviance factors, but at least I can drop a line or two in here, and hope that it’ll get the necessary attention. Hashicorp is losing traction because of this.

It’s just something I’ve been struggling with since 2015, and it’s very frustrating and disappointing that no one were able to address this design issue, or deliberately silenced and manipulated the opinion on the associated matter for personal gains - can only speculate now.

This is really a very long, and heavily demanded, story

#31520 #30910 #29990 #29182
#22036 #16200 #2430

Short example

provider "kubernetes" {
  alias = "vault_crd_provisioning"
}

module "vault_on_k8s_crds" {
  providers = {
    kubernetes = kubernetes.vault_crd_provisioning
  }

  source = "./vault_on_k8s/crds"
  ...
}

provider "kubernetes" {
  alias = "vault_provisioning"

  depends_on = [
    module.vault_on_k8s_crds
  ]
}

module "vault_on_k8s" {
  providers = {
    kubernetes = kubernetes.vault_provisioning    
  }

  source = "./vault_on_k8s"
  ...
}

provider "vault" {
  address = module.vault_on_k8s.address
}
2 Likes