My 25-resource Terraform Project is Taking 3 Minutes to Plan

Hi @lancejpollard,

I wasn’t able to review your configuration in sufficient detail to explain what exactly is going on here; perhaps you can see in the output some specific things that are taking a particularly long time, in which case I could try to explain why that might be.

With that said, it’s not typical to manage an entire multi-region infrastructure deployment in a single Terraform configuration. Usually a larger deployment would be split into smaller units – one per region is a common first level of decomposition, and then possible further decomposition within each region by functional area or by frequency/risk of change – so that the potential impact of a particular change is reduced, and so Terraform does not have to resynchronize the entire space of objects on every change.

(Performance and “blast radius” considerations aside, there is also the concern of limiting the total dependency space of a particular configuration so that you don’t limit your ability to respond with configuration changes during partial outages. If your single Terraform configuration covers all of your regions and one region has an outage, you’d need to run Terraform in an unusual way to avoid work to reconfigure other regions being blocked by the outage. This is a reason for the first levels of decomposition to align with your system’s failure domains.)

I would typically use something like your region module as the first level of decomposition, either by writing a separate root module for each region that all call into that shared module, or using the region module itself as the root and using workspaces in its backend to keep each region’s state separated. For AWS in particular, the former is usually preferable from a failure domain standpoint, because otherwise all of your state storage will be colocated in a single S3 bucket in a single AWS region. Different tradeoffs can apply to other platforms.

There are also often some extra objects that don’t belong to just one region. It looks like your “GlobalAccellerator” objects are examples of that. For these, it’s common to have a separate “global” configuration that can be applied once all of the other regions are active in order to produce whatever global objects are needed to make the parts appear as a single system. The global configuration can use data sources to retrieve information about the region-specific objects as necessary to complete the global object configurations. The global configuration will be spanning across multiple failure domains of course, so it’s best to keep it as small as possible within other constraints.

I hope that helps!

2 Likes