Creating a two way (or dynamic) dependency between resources

Hi, I’m writing a custom terraform provider and I have a dilemma. The API we are writing this provider for has a design-and-then-apply-changes kind of flow:

infrastructure_create
instance_array_create (depends on infrastrucure’s id)
instance_array_create (depends on infrastrucure’s id)
drive_array_create (depends on infrastrucure’s id and instance_array’s id)

infrastructure_deploy (depends on infrastrucure’s id)

The update flow is about the same, the deploy step must be executed if there are changes on the above resources.

The current implementation uses a single big resource for resource_1 and resource_2 and 3 are “flattened” into the first one. This allows the create and upgrade code of resource_1 to execute the “deploy changes” step when there are differences to be applied.

The code is available here: https://github.com/bigstepinc/terraform-provider-metalcloud

The current resource definition is something like this:

resource "metalcloud_infrastructure" "my-infra97" {
  
  infrastructure_label = "my-terraform-infra97"
  datacenter_name = "us-santaclara"

  instance_array {
        instance_array_label = "my-instance-array"
        ...

        drive_array{
          drive_array_storage_type = "iscsi_hdd"
          ...
        }

       ...
  }
}

While this works just fine now, perhaps there are other, better options available by splitting into separate resources. However, this would require to somehow create a two way relationship between the “infrastructure” resource and all other elements or find some other way to call the deploy function when needed.

An option that I’m considering is to create a ‘deployer’ resource that will be explicitly dependent on all the others. However, this feels a bit “hackish” to me. Is this the right approach?

resource "metalcloud_infrastructure" "my-infra" {
  
  infrastructure_label = "my-terraform-infra104"
  datacenter_name = var.datacenter

}

resource "metalcloud_instance_array" "my-infra" {

        infrastructure_id = metalcloud_infrastructure.my-infra.id
       ...
}

resource "metalcloud_deployer" "my-infra" {
  depends_on ["${metalcloud_infrastructure.*}","${metalcloud_instance_array.*}","${metalcloud_drive_array.*}"]
}

Any hints would be much appreciated.

Hi @alexandrubordei
this pattern looks similar to patterns I have seen in some other providers/APIs, specifically some of the ones managing critical pieces of infrastructure, such as CDN (e.g. Akamai or Fastly) or DNS (e.g. DynDNS / now Oracle).

Unfortunately we do not have any better ways to manage this in the SDK yet, but the issue describing this problem is tracked here https://github.com/hashicorp/terraform-plugin-sdk/issues/63


(1) What I would recommend for the time being is to commit & apply changes in every resource, instead of forcing users to chain and rely on some sort of “deploying” resource which deploys changes from the other ones.

(2) Alternatively you can go down the route of one giant resource that manages everything, which can be a bit more difficult to manage depending on the size and cadence of changes on the API. This is basically what Fastly does today: https://www.terraform.io/docs/providers/fastly/r/service_v1.html (hence the long documentation page)

(3) Another alternative - if you can model it that way at all - you could try modelling all “no-op” resources as data sources instead, which would output some serialized data that can be fed into one simpler “deploying” resource. I would only recommend this if these no-op resources don’t require any write operations on the API as data sources should not perform any write operations on any APIs. I’m not aware of any example in this area though.

(4) Akamai decided to go with a separate resource for activation/deployment of changes. Unless you work very closely with your users and educate them about this thoroughly it is possible they will struggle to understand that concept.

(5) PanOS decided to leave committing entirely outside of Terraform: https://www.terraform.io/docs/providers/panos/index.html#commits This may leave your users even more frustrated. They will likely expect Terraform to manage their entire infrastructure from end to end (including deployment) and it may not be clear to them they have to download/build/manage yet another tool to complete the deployment sequence.

From another perspective Akamai and PanOS approaches are safe as in the worst case scenario (when user forgets to define and apply akamai_property_activation), changes remain undeployed - so it will at worst cause frustration, but environment effectively remains unchanged. Users’ expectations and frustrations should not be underestimated though.


I would generally try to stick to 1-3.


Please keep in mind that my description of any mentioned approaches is not meant to say anything about any individual developer nor company. It is possible that they each picked the best solution for them at the time of implementation. While they diverged from what we consider best practice, they provide vital data for us which we can use when designing https://github.com/hashicorp/terraform-plugin-sdk/issues/63

Thanks @radeksimko for the detailed reply.

It seems there is no easy way out. We will probably stick to the (2) Fastly route which we already have implemented. The current resource definition is big but not that big and it seems to work just fine and feels ‘natural’ to our users who are used to thinking in terms of a complete infrastructure.

Option (1) is unfortunately a no go for us as it would take a very long time to provision anything as it could potentially involve rebooting servers at every step and waiting for them to get back up and these are physical servers which take anywhere between 1 minute to 7 minutes to boot up.

Option (3) could work but I believe it would be too convoluted to express a complex infrastructure that way and it will probably get to some corner cases that cannot be implemented.

Option (4) we actually tried this and ended up aborting it because while deploy works if you set the dependencies right, delete would not happen in the right (reversed) order with the apply starting before the other steps. Also the definition was more complex with a lot of explicit dependencies which doesn’t look elegant at all to the user… So we just stashed the changes, at least for now.

Option (5): we thought about this but as you mentioned seems to defy the point of Terraform if you have to use the cli.

We’ll follow issue #63 and if anything changes we’ll be the first to switch to that approach as it’s obviously more natural.

Fortunately, all other resources from the API will be outside of this pattern which will allow us to build regular resources so I think for now we’re fine.

We submitted the provider for review but we have yet to hear from the review team. I hope they will accept the current design, at least temporarily.

many thanks,
Alex