Hi @maxb
So I absolutely can provide more details about the provider, the reason I’ve been cagey about providing details is that it can end up being a lot of ‘textual noise’ vs just asking a more direct question 
But in this case I think yeah it may well help to elaborate.
So, the API I’m dealing with has lots of endpoints for different resources. One of these resources is a ‘service’ resource.
Now the way a ‘service’ works is that when you create a service it is ‘inactive’, meaning as a consumer of the service you can’t access it, not until the service has been ‘activated’.
A service has different ‘versions’. If you activate your service, and you then want to make changes to it, you have to ‘clone’ the service which produces a new version and that new version is inactive. You make your changes to the new version, then you activate it.
All other resources (there’s about 43 separate resources, such as a ‘backend’ resource, a ‘domain’ resource etc) must be created within an ‘inactive’ service version (each resource consists of its own separate CRUD API endpoints).
So in the old provider implementation the developers had hacked in the concept of ‘nested resources’ (e.g. the ‘service’ resource would have a ‘backend’ block and a ‘domain’ block etc and although it looks like a ‘block’ to a user of the provider, the provider internally was doing CRUD/resource based handling of each block, and then after each block is processed, then the service resource triggers its ‘activation’).
So when it came time for me to rewrite the provider to use the new Terraform framework I started implementing the same ‘nested resource’ design to keep things consistent and not break the interface too much for users.
But ultimately the logic required to implement such a design is gnarly and complex and I’d rather avoid it by ‘flattening’ the design.
So instead of…
resource "service" "example {
backend {
...
}
domain {
...
}
}
I’d prefer something like…
resource "service" "example" {
...
}
resource "domain" "example" {
service_version = service.example.version
name = "demo.example.com"
}
resource "backend" "example" {
service_version = service.example.version
...
}
Now for this design to work I need a way to ‘activate’ the service once all the resources are complete (e.g. once domain has been created inside the inactive service version, and backend has been created inside the inactive service version, and repeat for the other 40+ resources a user might define in their config).
So initially I was thinking…
resource "service_activation" "example" {
service_id = service.example.id
activate = true
}
So we know that we’re implying a dependency graph on the service
resource by referencing its output variable but that isn’t going to cause the service_activation
resource to wait for the domain
and backend
resources to have been completed.
The initial way we were looking to solve this was to just use depends_on
in the final service_activation
resource. The reason we were considering avoiding that was because of the Terraform documentation which states…
You should use depends_on as a last resort because it can cause Terraform to create more conservative plans that replace more resources than necessary. For example, Terraform may treat more values as unknown “(known after apply)” because it is uncertain what changes will occur on the upstream object. – Terraform docs
The other reason was I wanted to avoid users have to define a potentially very long depends_on
list (like I say, there are 43 or more resources they could end up with needing to list).
But considering I was unable to figure out a way to avoid this with just internal logic (see below for ideas I had for how to do this) I might have to rely on depends_on
and just hope that it doesn’t mess up any diffs to the point where they’re full of “known after apply”.
So to avoid depends_on
I started thinking about using Go’s sync.WaitGroup
and having each resource have access to the WaitGroup so they can call wg.Add(1)
and then defer wg.Done()
but I discovered that Terraform processes each resource in parallel so we have a race condition where maybe the domain
resource calls wg.Add(1)
before the service_activation
resource gets a chance to call its wg.Wait()
but maybe the wait method is reached before the backend
resource got processed and before it called its wg.Add(1)
.
So now I’m trying to figure out, how can I have this better ‘flatter’ design that’s more idiomatic to Terraform but have the service_activation
resource wait for a non-deterministic number of resources to complete (e.g. I don’t know what the user’s TF config looks like, nor if I did would I want to try and parse it because ultimately I need to know what the ‘plan’ looks like so if the domain
resource was edited or a new backend
resource was added, then I know there are two resources that are going to be processed and need to be waited on before trying to activate the service).
Hopefully this all made sense. Would be interested in your thoughts.
Thanks!