Custom provider background execution until the end of "terraform apply"?

I’m trying to develop a custom provider to create an SSH tunnel as a Data source.

My current issue is that once the Data source has been “fetched”, Terraform Core kills the custom provider process (and the SSH tunnel as a consequence).

$ terraform apply
data.jumphost_ssh.jumphost: Reading...
data.jumphost_ssh.jumphost: Read complete after 2s
data.http.example: Reading...
╷
│ Error: Error making request: Get "http://localhost:62759/": dial tcp [::1]:62759: connect: connection refused
│
│   with data.http.example,
│   on main.tf line 25, in data "http" "example":
│   25: data "http" "example" {

The SSH tunnel is created during the data source data.jumphost_ssh.jumphost:

data.jumphost_ssh.jumphost: Reading...
data.jumphost_ssh.jumphost: Read complete after 2s

but the custom provider got terminated, so when the next data source data.http.example tries to use it, there’s none anymore :frowning:

Is there a way to keep Terraform core to keep the custom provider alive until the end of the terraform apply?

This is difficult to accomplimish, but not impossible. The problem is, that Terraform is not intended for this kind of workflow. What I mean is that the process of the plugin (provider) is managed by Terraform and you don’t have control about the lifecycle of the process that represents your plugin/data source/resource. You have to know that Terraform starts the plugin multiple times (for example the plugin that implements the external data source). So when you for example create a resource the first time the plugin gets started 4 times:

# Create

## Get

[01:39.03] GetProviderSchema

## Validate

[01:39.03] GetProviderSchema
[01:39.03] ValidateResourceConfig

## Plan

[01:39.03] GetProviderSchema
[01:39.03] ValidateProviderConfig
[01:39.03] ConfigureProvider
[01:39.03] ValidateResourceConfig
[01:39.03] PlanResourceChange

## Apply

[01:39.05] GetProviderSchema
[01:39.05] ValidateProviderConfig
[01:39.05] ConfigureProvider
[01:39.05] ValidateResourceConfig
[01:39.05] PlanResourceChange
[01:39.05] ApplyResourceChange

So these calls (Plan, Apply, Read, …) represent just a “moment” of one resource and therefore do not cover lifecycles of multiple resources or data sources. The same applies to data sources (except that Plan and Apply are not called but Read).

But with a lot of creativity you could make it work… What you can try conceptual is the following: You wrap those resources that are dependent on an active SSH session with a beginning data source and an ending data source. The beginning data source opens a SSH connection and the ending data source closes it. Your resources that needs SSH are dependent on beginning data source and your ending data source must be dependent on those who are depdendent on beginning data source.

But this has many many drawbacks and things to consider… First, you need to create a SSH session without any required user input, second, you need to spawn a completely independent SSH process that is not a child process of your external data source. Third, you need to track the process id to kill it in the ending external data source and last, you need to keep the connection open, maybe with a while loop as command when opening the ssh connection. So many drawbacks… I can not even number them all at once… First your data source is read multiple times, at least once at plan-time and at-least once at apply time, keep that in mind. Second, your plan or apply can fail leaving you with an open connection. So you need to track existing connection on replan/reapply… not so easy. Another thing to keep in mind is that you at least need one successful plan or apply to the connection is always openened and then closed. You maybe could make it work to shift this SSH logic to resources, then with a lot of effort you can make it work to only open and close a SSH connection during apply-phase with local-exec provisioners inside a beginning and ending resource but leaving you with similiar drawbacks as stated before.

So I rather do not recommend this approach. :smiley: Try to make the SSH stuff before running terraform itself. Or maybe a remote-exec provisioner is an option? Or another idea is to bring your terraform project to your remote machine and run terraform there?

1 Like

Thanks @teneko for taking time to reply in a so detailed manner! :pray:

I’ve discovered indeed that controlling the lifecycle of the plugins (providers) is not easy, and I perfectly understand that the design behind providers is not to create persistent background resources, so it’s a hack that I’m trying to achieve.

I understand that it’d be possible to create the SSH tunnel before starting TF, but here’s an example where it’d be cumbersome: let’s say I want to use this EKS TF module:

You see that:

  • module.eks will create the EKS cluster (with a private EKS endpoint, not reachable from public internet).
  • and inside the module.eks module, it will also use the provider kubernetes to manage the aws-auth ConfigMap, but as you guess, without the SSH tunnel, it will fail.

I know that what would be easier is to run TF from inside the AWS VPC, but we can’t do that.

I know that I could also split the aws-auth ConfigMap part to a different TF stack, so I can create the SSH tunnel between the 2 TF calls, but I’m greedy: I’d like to solve the technical complexity so on the user side, all they need is just a plain old terraform apply.

Also, running the SSH outside means that I need to run some aws eks describe-cluster, parse the returned JSON, check errors, etc. which surfaces once again the complexity instead of using module.eks.cluster_endpoint out of the box.

I’ll dig in the subprocess option, keeping in mind that the data source can be invoked several times (I was not aware of that, so thanks for the heads up! :pray: ).

I’ll update here if I manage to have a POC.

Thanks again :bowing_man:

Call me crazy but you can fork the kubernetes provider and wrap SSH functionality around the resources that makes use of provider.kubernetes.exec. When extending kubernetes you can remap the kubernetes provider to your custom kubernetes provider by using provider aliases inside your eks module. The only problem I could see is that you would need to dynamically set the required SSH variables in provider "custom_kubernetes" {} but I am not sure anymore whether dynamic, I mean unknown values in provider blocks during plan-time are allowed. I think yes. You should test it before you follow this approach.

Another problem I see with this approach I just described is, that you first know the information of the eks module for a SSH connection after the module has been created but this module makes already use of kubernetes… this seems not right. So you would also need to copy the EKS module and change it so it uses your custom kubernetes with now known SSH informations, By copying the eks module you can also change the kubernetes resources to meet your requirements to pass necessary SSH informations. I do not really know what aws eks is… you know better, but I hope I could give you some inspirations.