my team and i are managing a very large terraform repository. Currently without any module structure. All terraform files (~500 or more) are inside one directory. Many teams are involved in our whole setup. The basic workflow is like this:
- team needs some configuration
- team creates pull request in our terraform repository
- pull request gets reviewed and merged
- ci pipeline starts, executing plan and approve
I guess, nothing special here. However since our repository is very large, terraform executes a lot of requests against our infrastructure. We know that this is the usual way terraform works, but would like to reduce the blast radius of one single pull request. For example, a pull request where a team updates a typo in a description currently executes a lot of requests against our infrastructure. The queried components are very crucial components in our platform. We are trying to reduce the blast radius of single changes and wanted to ask for help or suggestions. Does anyone has a good idea how to structure a terraform project, so that changes to the infrastructure does not necessarily results in so many requests against the involved components/services?
Maybe a module structure with different directories where terraform only plans/approves the files in that specific directory?
Thanks for any suggestions.
The standard thing to do would be to split this into lots of smaller pieces. It must take a long time to do a plan or apply on such a large repo!
We have things split based on product/team/tier/cadence/etc. Basically split it into whatever makes sense for you.
thanks for your reply and your suggestions. We also thought about something like this.
Let’s say the modules have a lot of cross references. Would it still make sense to go this way?
I would also be very interested in more detailed information of your setup. What exactly happens when a file is changed in one of your team directories for example? How does terraform understand to only create a plan for the changes inside this directory?
Besides that we are thinking about using the –target command line option of terraform, e.g.
terraform plan -target=my_resource_type.resource_name. This seems to do exactly what we want, but the docs clearly state that the usage of this should be an exception and not the usual workflow of a setup. Any opinions/experiences on this?
Thanks for any help.
We have split thing into totally separate root modules. So in our case multiple Git repositories, each with their own remote state file & triggered via our CI system to apply changes when they are merged.
We make extensive use of modules (both third party and locally created) to encapsulate common functionality.
For relationships between resources that are managed in different root modules we use a variety of different methods - ranging from the use of the remote state data source, naming conventions and use of other data sources as needed.
In terms of actually running Terraform to apply changes 99% of the time this is fully automated using our CI system, without the use of
-target. Manual applies (and the very occasional use of
-target) are only done if absolutely necessary - for example if things have got into an awkward state, or where a calculated value cannot be referenced before another resource has been created/updated.
We also use our CI system to use
terraform plan (and other checks) on branches, so we can see what a change would do to reduce the likelihood of breakage or changes impacting more than expected.
okay, i really appreciate your infos, thanks. I think we will try to use a way which includes extensive usage of modules.
Do you have an idea why exactly the usage of the -target option should be avoided and only be used rarely? Let’s say we can guarantee that the underlying infrastructure can not be changed on any other way than using our terraform repository. In that case the usage of -target should be safe, shouldn’t it?
It might help to consider the analogy of performing a Git merge while totally ignoring some subset of the files in the repository. The merge operation itself may succeed, but the result it created is unlikely to match either what would’ve existed if nothing was merged at all or what would’ve existed if you’d wholly applied the merge. Instead, there is now a rather arbitrary set of source code snapshots that cannot be described by just the lineage of changes up to that point. Also, the resulting source tree may no longer actually work because different files in a repository often depend on aspects of one another.
If you happen to know that there weren’t any changes pending for any of the items you excluded anyway then indeed it’s in theory safe to use
-target. But the only true way to be sure that your change doesn’t have downstream affects is to create a full plan and let Terraform compare the configuration (desired state) with the prior state.
If you’ve identified a suitable resource or set of resources to use with
-target to get the effect you want then that set of resources is a good candidate for where to draw an architectural boundary between two configurations with an explicit interface between them. You can then carefully consider the implications of updating the first one without also updating the second one – since you’ll be doing that every time now – and fix that as part of your Terraform configurations rather than only as a one-time-use command line argument.
To more directly address your original question: if you’d like Terraform to assume that nothing has changed outside of Terraform and just plan against the previous run’s state then you can achieve that using the
-refresh=false planning option, which disables the step of asking the provider to retrieve updated data about the existing objects.
If you choose to do this though, please keep in mind that there are certain changes which count as “changes outside of Terraform” but that have causes separate from manually changing something in the vendor’s admin console:
- Upgrading to a new version of Terraform CLI.
- Upgrading the provider plugins you use.
- Changes made by the vendor of the remote systems you depend on outside of your control.
If you choose to use
-refresh=false for your routine work then I would suggest that you plan to have a process for running without that option occasionally so that you can safely traverse local software upgrades and remote system changes.
In particular, you should make sure that you keep your dependency lock file in your version control and make sure to run without
-refresh=false whenever you are applying a change which affects the locked dependencies.
I would still suggest working towards a more decomposed system where there are well-defined architectural boundaries between your subsystems, but hopefully
-refresh=false can make your current design easier to work with in the meantime while you work towards that.