Custom provider background execution until the end of "terraform apply"?

samuel-phan · October 10, 2022, 11:09pm

I’m trying to develop a custom provider to create an SSH tunnel as a Data source.

My current issue is that once the Data source has been “fetched”, Terraform Core kills the custom provider process (and the SSH tunnel as a consequence).

$ terraform apply
data.jumphost_ssh.jumphost: Reading...
data.jumphost_ssh.jumphost: Read complete after 2s
data.http.example: Reading...
╷
│ Error: Error making request: Get "http://localhost:62759/": dial tcp [::1]:62759: connect: connection refused
│
│   with data.http.example,
│   on main.tf line 25, in data "http" "example":
│   25: data "http" "example" {

The SSH tunnel is created during the data source data.jumphost_ssh.jumphost:

data.jumphost_ssh.jumphost: Reading...
data.jumphost_ssh.jumphost: Read complete after 2s

but the custom provider got terminated, so when the next data source data.http.example tries to use it, there’s none anymore

Is there a way to keep Terraform core to keep the custom provider alive until the end of the terraform apply?

teneko · October 11, 2022, 9:56am

This is difficult to accomplimish, but not impossible. The problem is, that Terraform is not intended for this kind of workflow. What I mean is that the process of the plugin (provider) is managed by Terraform and you don’t have control about the lifecycle of the process that represents your plugin/data source/resource. You have to know that Terraform starts the plugin multiple times (for example the plugin that implements the external data source). So when you for example create a resource the first time the plugin gets started 4 times:

# Create

## Get

[01:39.03] GetProviderSchema

## Validate

[01:39.03] GetProviderSchema
[01:39.03] ValidateResourceConfig

## Plan

[01:39.03] GetProviderSchema
[01:39.03] ValidateProviderConfig
[01:39.03] ConfigureProvider
[01:39.03] ValidateResourceConfig
[01:39.03] PlanResourceChange

## Apply

[01:39.05] GetProviderSchema
[01:39.05] ValidateProviderConfig
[01:39.05] ConfigureProvider
[01:39.05] ValidateResourceConfig
[01:39.05] PlanResourceChange
[01:39.05] ApplyResourceChange

So these calls (Plan, Apply, Read, …) represent just a “moment” of one resource and therefore do not cover lifecycles of multiple resources or data sources. The same applies to data sources (except that Plan and Apply are not called but Read).

But with a lot of creativity you could make it work… What you can try conceptual is the following: You wrap those resources that are dependent on an active SSH session with a beginning data source and an ending data source. The beginning data source opens a SSH connection and the ending data source closes it. Your resources that needs SSH are dependent on beginning data source and your ending data source must be dependent on those who are depdendent on beginning data source.

But this has many many drawbacks and things to consider… First, you need to create a SSH session without any required user input, second, you need to spawn a completely independent SSH process that is not a child process of your external data source. Third, you need to track the process id to kill it in the ending external data source and last, you need to keep the connection open, maybe with a while loop as command when opening the ssh connection. So many drawbacks… I can not even number them all at once… First your data source is read multiple times, at least once at plan-time and at-least once at apply time, keep that in mind. Second, your plan or apply can fail leaving you with an open connection. So you need to track existing connection on replan/reapply… not so easy. Another thing to keep in mind is that you at least need one successful plan or apply to the connection is always openened and then closed. You maybe could make it work to shift this SSH logic to resources, then with a lot of effort you can make it work to only open and close a SSH connection during apply-phase with local-exec provisioners inside a beginning and ending resource but leaving you with similiar drawbacks as stated before.

So I rather do not recommend this approach. Try to make the SSH stuff before running terraform itself. Or maybe a remote-exec provisioner is an option? Or another idea is to bring your terraform project to your remote machine and run terraform there?

samuel-phan · October 11, 2022, 4:32pm

Thanks @teneko for taking time to reply in a so detailed manner!

I’ve discovered indeed that controlling the lifecycle of the plugins (providers) is not easy, and I perfectly understand that the design behind providers is not to create persistent background resources, so it’s a hack that I’m trying to achieve.

I understand that it’d be possible to create the SSH tunnel before starting TF, but here’s an example where it’d be cumbersome: let’s say I want to use this EKS TF module:

github.com

terraform-aws-modules/terraform-aws-eks/blob/master/examples/complete/main.tf

provider "aws" {
  region = local.region

  default_tags {
    tags = {
      ExampleDefaultTag = "ExampleDefaultValue"
    }
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }

This file has been truncated. show original

You see that:

module.eks will create the EKS cluster (with a private EKS endpoint, not reachable from public internet).
and inside the module.eks module, it will also use the provider kubernetes to manage the aws-auth ConfigMap, but as you guess, without the SSH tunnel, it will fail.

I know that what would be easier is to run TF from inside the AWS VPC, but we can’t do that.

I know that I could also split the aws-auth ConfigMap part to a different TF stack, so I can create the SSH tunnel between the 2 TF calls, but I’m greedy: I’d like to solve the technical complexity so on the user side, all they need is just a plain old terraform apply.

Also, running the SSH outside means that I need to run some aws eks describe-cluster, parse the returned JSON, check errors, etc. which surfaces once again the complexity instead of using module.eks.cluster_endpoint out of the box.

I’ll dig in the subprocess option, keeping in mind that the data source can be invoked several times (I was not aware of that, so thanks for the heads up! ).

I’ll update here if I manage to have a POC.

Thanks again

teneko · October 11, 2022, 5:21pm

Call me crazy but you can fork the kubernetes provider and wrap SSH functionality around the resources that makes use of provider.kubernetes.exec. When extending kubernetes you can remap the kubernetes provider to your custom kubernetes provider by using provider aliases inside your eks module. The only problem I could see is that you would need to dynamically set the required SSH variables in provider "custom_kubernetes" {} but I am not sure anymore whether dynamic, I mean unknown values in provider blocks during plan-time are allowed. I think yes. You should test it before you follow this approach.

Another problem I see with this approach I just described is, that you first know the information of the eks module for a SSH connection after the module has been created but this module makes already use of kubernetes… this seems not right. So you would also need to copy the EKS module and change it so it uses your custom kubernetes with now known SSH informations, By copying the eks module you can also change the kubernetes resources to meet your requirements to pass necessary SSH informations. I do not really know what aws eks is… you know better, but I hope I could give you some inspirations.

Topic		Replies	Views
Provider plugins that live for the duration of a terraform run Terraform	2	729	August 6, 2019
Custom Provider troubles - inconsistent result for Root resource Terraform	3	809	April 26, 2021
Provider endpoint Terraform	0	247	February 19, 2020
Using custom terraform provider in docker Plugin Development	0	251	August 30, 2023
Custom provider dependency Plugin Development	0	153	January 3, 2024

Custom provider background execution until the end of "terraform apply"?

Related topics