Profiling terraform execution

chell0veck · May 8, 2023, 4:22pm

Hi team,

Please suggest how to profile terraform execution to get some useful metrics.

The origin of the question is that we have ~1K modules wrapped into tree structure with terragrunt and executed with python.

I can timeit execution itself and parse resources section of state file.
But I missing internals metrics to come up with module size metric, num of api calls e.g.

To things I’m trying to achive:

optimize modules to avoid unnecessary calls
deduce complexity of modules driven by some rationale numbers.

Regards,
Serhii

apparentlymart · May 8, 2023, 4:40pm

Hi @chell0veck,

For most (but not necessarily all) Terraform configurations, the most significant delays at runtime come from waiting for remote network APIs to respond, or to eventually become consistent.

You can potentially measure those by running terraform plan -json -out=tfplan and terraform apply -json tfplan, where the -json argument will ask Terraform to produce machine-readable output in the form of a stream of JSON objects written to stdout.

If you write a wrapper program to consume that output and record the arrival times of certain interesting events then you should be able to determine which operations are taking the longest.

skeggse · March 5, 2024, 10:50pm

I have some workspaces that reliably take multiple hours to plan, even with -refresh=false (and confirming with TF_LOG=trace that they’re making no API calls). It’s not even a provider issue, as far as I can tell: the provider completes its work in a millisecond, and instead it’s Terraform’s core that eats up all my cores doing nothing interesting.

It’d be really nice to have a way to dig into profiles to try and understand what’s happening.

apparentlymart · March 7, 2024, 10:02pm

Hi @skeggse,

If it’s an inefficiency in Terraform Core itself then I expect we’d need to use Go profiling tools to get into that, since the Terraform language runtime only has hooks around the external events it’s orchestrating, not around its own CPU-bound work.

jbardin · March 8, 2024, 5:08pm

A full execution profile isn’t usually needed, and won’t help much until one is working on some specific optimization. The first thing I would do is look for gaps in the trace log timestamps. That will usually narrow down the problem sufficiently to indicate what the slow operation is.

Given the known limitations of Terraform, there are two common sources of slowness during the plan:

an excessively large and highly connected configuration graph can be slow to process
Many references to resources with very large numbers of instances

skeggse · March 14, 2024, 6:55pm

Given the known limitations of Terraform, there are two common sources of slowness during the plan:

an excessively large and highly connected configuration graph can be slow to process

Many references to resources with very large numbers of instances

Yes, it does seem that this is a consequence of the latter. Maybe Performance issues when referencing high cardinality resources · Issue #26355 · hashicorp/terraform · GitHub?

Unfortunately, the usual recommendation of “break this workspace apart into smaller workspaces” isn’t viable. We have a monolith. We end up having lots of security rules that all apply to the same context. I could split them up alphabetically, but that seems like a really silly workaround.

I get that performance isn’t the most important concern with Terraform, but having plan time scale exponentially with the number of resource instances is really troublesome for large-scale deployments. Sure, we can sometimes break things apart (and we do, as much as possible, despite the operational overhead we incur by doing that), but sometimes there really isn’t a better way to model the resources

system · May 15, 2024, 6:55pm

This topic was automatically closed 62 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Troubleshooting Slowdown AWS	1	483	August 6, 2020
OpenTelemetry support for Terraform Terraform	2	1464	September 1, 2022
Terraform plan taking long time Terraform	9	24363	January 15, 2024
Unusual API Call Patterns When Managing Over 10,000 Resources in a Single Workspace Terraform	6	219	October 7, 2025
Module performance Terraform	1	503	September 16, 2022

Profiling terraform execution

Related topics