How to integrate Terraform Cloud in your CI pipeline

Hi, I’m looking for best practices and experiences with integrating Terraform Cloud in your CI pipeline.
In a nutshell: we use Terraform Cloud in VCS mode, but we would like to start a TF Cloud run when the CI pipeline is done, not immediately when the commit/merge happens.

So some context: for my project I have some CI (GitHub Actions) that builds and publishes a Docker image to AWS. This takes about 12 minutes (unfortunately).
Our infrastructure is managed by Terraform: we have some TF code that will deploy this Docker image on AWS Fargate. The version of the Docker image (the task definition) is managed by Terraform.

But when we use Terraform Cloud in the VCS mode to we end up with a race condition: when we commit/merge code both CI and TF Cloud will start. Since the CI takes 10+ minutes, TF Cloud will be ready to update the infrastructure, way before the image it wants to deploy actually exists…
We can work around this by waiting until the CI is done to manually confirm and apply. But this makes it impossible for us to switch to the auto apply method.

Does anyone else have a similar setup / issue?

Cheers!

Hi @kvrhdn!

The VCS-triggering mode is designed as an “easy mode” for situations where Terraform is being used alone, rather than as part of a pipeline as in your case.

If another system (like a CI engine) is acting as the orchestrator, it’d be best to turn off VCS triggering in Terraform Cloud and instead arrange for your CI system to run the Terraform CLI following the patterns described in Running Terraform in Automation.

You can enable remote operations for your workspace and configure the remote backend so that terraform plan and terraform apply will trigger remote runs in the Terraform Cloud execution environment. This’ll require your CI environment to have access to some Terraform Cloud API credentials but not to any of the other credentials the Terraform configuration depends on.

In that model, the Terraform CLI is really just acting as a client to the Terraform Cloud API while mimicking the usual local Terraform CLI workflow, so there is an alternative approach of using the API directly that might be easier to integrate with the other steps you are taking, depending on how you’re building your CI setup. The documentation shows usage from a shell using curl, but you can mimick those same steps using an HTTP client library in any other programming language.

1 Like

Thanks for the elaborate response and the great analysis! I didn’t think of these systems as ‘orchestrators’, but that is spot on.

I think what complicates this setup for us is that GitHub Actions does not provide a manual confirmation step. This is obviously a shortcoming of GH Actions not Terraform Cloud, but I hoped TF Cloud could fulfill that role for us since I really like the confirmation flow in TF Cloud.
What I’m missing now is the ability to schedule a terraform apply from the CLI, that still has to be confirmed manually. If TF Cloud is configured to not use VCS but with remote execution, terraform apply will not ask for approval anymore.

It would also be nice to be able to change the default message (“Queued manually using Terraform”) to something build specific.

Btw, if I have ideas for new features / improvements for the Terraform Cloud team. Is there some place I can leave these?

Cheers!

Hi @kvrhdn!

From what you’ve described I think my recommendation would be to set up the GitHub Action to queue a run using the Terraform Cloud API directly, rather than using Terraform CLI.

With that approach, you can directly ask Terraform Cloud to start a normal (non-speculative) plan in the background and then let the Terraform Cloud UI take over the workflow from that point. If you follow the steps on the API-driven Runs page then step 5 on that page is the one that will start the run in Terraform Cloud.

One limitation of this approach is that the GitHub Action will succeed as long as queueing the plan is successful, regardless of whether the plan succeeds. In theory you could improve on that by having the action poll the get run details API for a minute or so after the run is queued and watch for its status to change to see whether to pass or fail the action. After that, the action can exit and let the rest of the process happen in the Terraform Cloud UI.

Regarding the message associated with the run: the example on the API-driven Runs page (and the CLI remote backend behavior) both create a run as a side-effect of some other action and so don’t have a way to directly customize the message. However, if you upload the configuration content with the extra argument data.attributes.auto-queue-runs set to false then that’ll disable the automatic queuing side-effect and then you can create a run directly, where you can set the data.attributes.message to an appropriate custom string.

Regarding general feedback: the Terraform product managers do pay attention to this forum and may have seen this thread already, but you could also consider sending feedback via the support contacts you can find from inside the Terraform Cloud application.

1 Like

Hi @apparentlymart, thanks for the great response again. I’ve created an action for GitHub Action which uses HashiCorp’s Go SDK to create a new run on Terraform Cloud.
This has been going really well to be honest!

It provides us all the flexibility we need:

  • immediately run a speculative plan, but only run a non-speculative apply after the build finishes
  • we can manually set the run message
  • we can still follow the manual confirmation flow using the Terraform Cloud UI

The only drawback of this setup is that now all users of the workspace can create a new run. In VCS mode it isn’t allowed to do anything that wouldn’t be in sync with the code repository. But I’d guess this is where the different teams would come in?

My action is publicly available on GitHub: kvrhdn/tfe-run.