Data source side effects

Hello everyone,

I’m developing the Juju Terraform provider using the Plugin Framework and I encountered an unexpected behavior that might be interesting to get insight on, or that might suggest a different design.

Summary of the issue: I planned to use a data source and its Read method to add a config to an in-memory map stored in the shared client, and then fetch the same value from the map during another resource’s Create.

The resulting plan looks like this:

data "juju_external_controller" "external_controller" {
  controller_name      = local.external_controller_name
  ...auth_details i need in the other resource to perform some operations.
}

resource "juju_other_resource" "example" {
  external_controller = data.juju_external_controller.controller_name
}

In all of our resources and data sources, we fetch and share a common JujuClient in this way (which I believe is very common):

func (data *dataSource) Configure(ctx context.Context, req datasource.ConfigureRequest, resp *datasource.ConfigureResponse) {
	...
	provider, ok := req.ProviderData.(juju.ProviderData)
	...
	d.client = provider.Client
}

In this new data source in the Read method I was adding a config to a map stored in the provider.Client struct

func (d *dataSource) Read(ctx context.Context, req datasource.ReadRequest, resp *datasource.ReadResponse) {
     ...
	 d.client.configMap[<controller_name>] = <auth_values>
}

This is working fine.

However when I then try access the same map from a resource’s Create method the map is empty.

func (r *resource) Create(ctx context.Context, req resource.CreateRequest, resp *resource.CreateResponse) {
   value, ok := d.client.configMap[<external_controller_name>]
   // ok is false
}

From what I’ve understood while debugging the provider, it seems the client instance is not shared between the time the data source is read and the resource is created.
It seems to me the map is filled by datasource.Read during the “plan” phase and the resource.Create is run during the “apply” phase and they don’t share the same client struct.

For now, a solution I found is to use an ephemeral resource, because its Open (or equivalent) method shares the same client with resource’s Create, so it fills the map and the other resource can use it. However, I’d really like to use the data source because the ephemeral resource has limitations on using its values and I can’t leverage Terraform’s dependency graph; I must manually set depends_on.

Example with the ephemeral resource:

ephemeral "juju_external_controller" "external_controller" {
  controller_name      = local.external_controller_name
  ...auth_details
}

resource "juju_other_resource" "example" {
  external_controller = local.external_controller_name
  depends_on = [ephemeral.juju_external_controller.external_controller]
}

Is there a solution to still use the data source or the only solution is the one is to use ephemeral resource or even another possible solution?

Consider the case of saving a plan with the -out=<file> Terraform CLI argument.

In that case, of course the in-memory data will be lost between the planning phase and the apply phase, because both Terraform CLI and your provider binary will have exited and been restarted. It’s two different processes.

Consider also that it is possible to run the two resources with entirely different provider plugins (different releases of the same provider) which also won’t see each other’s memory.

This doesn’t sound like a safe road to travel.

As a general rule, Terraform expects data resources to not have any side effects. Data sources are read once, usually during plan, but sometimes deferred until apply if the configuration is not known. This can cause unexpected changes between plan and apply, and in the case of long-lasting side-effects, it can cause unexpected changes in the next plan as well. In your example it will also mean that the in-memory change will only be present during plan or apply, but never both.

Ephemeral resources are unique in that they have a “close” method indicating the end of their lifespan so that anything they did can be cleaned up. The also cannot generate any data which is stored between plan and apply, and will be read anew in both phases of execution.

If you need to manage a long lasting side-effect, then a managed resource is most appropriate.

Thanks a lot for the replies. I think we will go by adding a new block to the juju provider block to add these additional connection details.

It seems the most safe and intuitive place where to put connection details.