Flow for executing small changes via -target, detecting small changes

I started this topic in stackoverflow, where @apparentlymart gave a fantastic detailed response. I am trying to bring this a step forward, and do it in the right location.

I have a long blog post about to be published on this.

Terraform is great at linking together resources into groupings (modules), and modules into higher-level modules, etc. It represents a true environment, and makes it reusable. But its “scope of execution” is that higher config, and refresh, as Martin pointed out, gets very expensive.

People work around this via “terraservices”, wrapping Terraform in things like Terragrunt… and losing all of the definition (i.e. functions) of higher-level services.

Since they do this anyways, I am looking at an approach to:

  1. Have terraform plan detect just which files in a config, and therefore by extension which resources, have changed.
  2. Apply refresh via plan just on those resources
  3. Use apply -target (or just use the plan output from the previous step) to limit it to just those.

It recreates the “let’s change just a few things” from terraservices, while keeping intact the Terraform goodness of grouping together modules into higher-level abstractions.

I can do this externally, I want to do it internally.

Does this work? Is there interest in terraform itself providing this capability?

Hi @deitch,

Assuming that you are tracking configuration changes via version control, because Terraform does not track configuration changes directly, there are still a few problems inherent with this approach.

The individual files in a configuration are unrelated to the overall structure of the configuration, and only serve as a convenient way for humans to organize the configuration data. There is no other relationship between the configuration files and the logical structure of the configuration, so starting from changes to individual files is not going to produce meaningful results. The simplest example is that it’s common to store variables and locals in files separately from the resources they control.

You also cannot determine from only the diff in the configuration files which resources if any are going to be affected. You could have huge changes to all files which affect no active resource instances, or you could have a single change in a single file which requires changes to every single resource instance. Any time you have remote modules in the configuration (which is a large percentage of all managed resources in Terraform), there will never be a change directly to the resource configuration unless you are upgrading the module, changes all happen indirectly through input variables.

Normal usage of Terraform is to keep the infrastructure in a state which matches what is declared in the configuration. Under these conditions, terraform plan always shows the minimal changes necessary to reach that desired state. In other words, the only way to determine which resources are affected by changes to the configuration is to run terraform plan.

Trying to continually manage infrastructure which has diverged from what is declared in the configuration is fighting against how Terraform is designed to operate, often makes it difficult to asses just how far off things have diverged, and is more apt to get into situations where even more manual effort is required to restore things when something goes awry. Many users try build out huge single configurations and manage them in a more granular fashion via -target, but inevitably that fails when the configuration becomes too large to manage or reason about, or they cannot recreate parts of that configuration in isolation when things need to be rebuilt.

Hi @jbardin , and thanks for the detailed response.

Yeah, I think you understood well. I am trying to do what I call “square the circle”. As described above, I want to take advantage of Terraform’s excellent higher-level definition (not losing it all by splitting them up and handing it off to Terragrunt or similar) but also get fast execution (refresh) times.

Based on what you said, the files are a human convenience, but you don’t know what really changed unless you let tf build the whole graph (does everyone else also have alias tf=terraform in their aliases file? :slight_smile: ).

Given that, would the 3-step approach I highlighted above work? It does rely on tf plan, so it builds the graph, but disables the refresh, so that we can tease out what actually changed in the graph based solely on user input, and then run it with refresh but hitting only those targets.

Or is it that targets also are the human-convenience?

I would suggest not using -target at all. The target option is only really designed for exceptions, and causes plenty of issues of its own if overused. If you don’t want to refresh you can use terraform plan -refresh=false however you need to be careful about not refreshing, as there might be drift between what the state file believes is real and actual reality - which could result in unexpected changes or failed applies (if using plan files).

Going back a step however, when you say “fast execution times”, what are you looking for? For us even our more complex root modules generally only take a couple of minutes to refresh, so we don’t find very log times (for the plan) to be a significant problem - it is often the apply which can take quite a while (e.g. for creating things like Kubernetes clusters or applying Helm charts). What sort of duration are you hoping for, and how long are things currently taking?

Ish… this used to work back in some earlier Terraform versions, but the concept of not refreshing data sources was removed from Terraform around 0.13 IIRC, meaning it now only really works as desired if your configuration doesn’t have any data blocks.

I dislike this decision and wish it could be revisited.

but the concept of not refreshing data sources was removed from Terraform around 0.13 IIRC, meaning it now only really works as desired if your configuration doesn’t have any data blocks

Oh, interesting. I didn’t know that.

you need to be careful about not refreshing, as there might be drift between what the state file believes is real and actual reality - which could result in unexpected changes or failed applies (if using plan files).

This is where it gets interesting. The very philosophy of things like terragrunt is splitting up statefiles, which means by definition only refreshing certain parts at a time. Users clearly are adopting this, and in doing so, giving up on the power that is Terraform modules to compose lower-level units. As you probably can tell, this bothers me. It isn’t that I dislike other tooling; it is that I dislike using such tooling when it forces you to give up so much of the inherent tool’s power and capability.

I treat the apply as a wash. Whether I limit my refresh to “just these 10 resources and apply them”, or refresh the whole thing and come up with those same 10 resources, the cost of apply will be the same. That time cost is the minimum necessary and equal in both scenarios.

I have seen refresh times (well, plan times, but I think we can assume fairly that the graph resolution itself in memory should be very quick, so the main part that is left is refresh, now that Martin explained how it works) of hours. I don’t have access to my last case (I am a consultant, so I work with clients and then lose access after a job is done), but my current one, the AWS Config snapshot reported ~25,000 resources. It takes a very long time to do a refresh of all of those.

The current structure is using Terragrunt, but we are reevaluating it extensively. The pain from a large part of that (lack of definition of stacks and environments and higher-level constructs, which would be really simple in Terraform modules) is why I have been exploring using native Terraform

Are you saying even something of that size should refresh in a few minutes?

25k resources is a lot! I’d expect to have split that root module into multiple smaller pieces quite some time ago in that case. Generally we see our root modules ending up around the few hundred to few thousand resource type sizes.

The general recommendation is to have each root module (i.e. state file) be split based on tier/blast radius/product/team/update frequency. For example, we have infrastructure over multiple AWS accounts, with different tiers - a “bootstrap” layer which just enables Terraform & our CI system to operate, an “account” layer which sets up basic access to the account, a “base” layer which sets up basic VPC networking and a Kubernetes cluster and then multiple “app” layers which install things within Kubernetes (split based on different teams). So we have many different code repositories targeting smaller pieces of the overall infrastructure, which works well as different teams are responsible for different areas, and some areas hardly ever change and need higher levels of scrutiny.

We still make extensive use of modules to abstract away complex details and allow code reuse. Then we make use of a combination of the remote state state source & other data sources to allow loose or tighter coupling between root modules (where needed - some chunks are fairly stand alone).

That isn’t necessarily true. For some of our more complex root modules the time it takes to process the graph can be just as long as the refresh time - it really depends, with complex data manipulations slowing down the processing and slow API calls slowing down the refresh.

25k resources is a lot!

Is it? I have seen and managed environments much larger than those. Think of the base production stack for a large financial, or even part of it. Or a large health care, or streaming media.

The general recommendation is to have each root module (i.e. state file) be split based on tier/blast radius/product/team/update frequency

In general, I agree with you. FWIW, my approach generally is to manage Kubernetes apps outside of Terraform. Lots of pipeline options, and the underlying philosophy is fundamentally different, which means I don’t have the question of statefiles and time-to-evaluate and all of that, as Kubernetes has its own mechanism for that. But that is another interesting discussion to have elsewhere.

But then you run into the “how do I manage this whole thing together?” problem. The same way that you define “3x ec2 instances and 6x EBS and 2x S3 buckets and 1x Kubernetes cluster” as “my foo service” and create a “foo module”, you have a definition of higher-level things in the environment, all the way up to an environment itself. Try checking the state of the whole environment, or deploying a whole stack of services inside an environment to another region, and there is no easy way to do it. Terraform provides this lovely modular (pun intended) way to bind these together and create a definition, and this approach does not use it.

We still make extensive use of modules to abstract away complex details and allow code reuse.

Isn’t that only at the lower levels? To continue our above example, only the “foo” service. But if there are 2x foo and 3x bar and 1x baz that make up the stack per region in which we are deployed in production, let alone 4x stack (one per region) plus 1x global services that make up an environment, nothing ties them together definitionally for reuse (unless you build that). That is precisely what I was trying to get at by using Terraform’s capabilities.

be split based on tier/blast radius/product/team/update frequency

I tend to find that this works well with a combination of: the highest level is an environment; lots of modules, maybe in different repos, so each team can work at their own pace, and the only thing that hits the upper level is PRs to update versions; and proper review of tf plan, does that.

Huh, that is surprising. I just worked (as in, someone really smart helped me) do some graph analysis and got an extremely large graph with many nodes and edges (tens of thousands) from 6 mins to 2 secs. As long as it all is in memory, good algorithms should do it. Once you have to fork/exec things or make network API calls (i.e. refresh), different ballgame entirely.

Data resources are not “refreshed”, because they are transient objects that live only for one run. Their entire purpose is to represent a dependency on data that changes outside of Terraform. It doesn’t make sense to use stale results from old runs, and in some cases it won’t even work to do so because e.g. the provider schema has changed since the last run and so the data in the state is invalid.

Data resource results only live in the state after a run is complete as a convenience to those using development aids like terraform console. It’s more likely that they would be removed from state entirely than to support using stale results from older runs. If you want values to be fixed rather than taking the latest from the remote object then indeed data blocks are not the appropriate tool for that goal; better to write the values you want directly into the conference in that case, so that it’s clear that the values are not going to automatically track changes made in a remote system.

-refresh=false means, essentially, “trust that Terraform’s cache of the previous managed resource data is accurate”, which is possible because providers are designed to consume their results from previous runs in that case. There is no equivalent mechanism in providers for data resources because a data resource read works only with the current configuration, and cannot “see” the result from the previous run. (It would be odd if the results of a data resource could vary based on what was returned last time, rather than only based on the configuration and the remote system’s API response).

I think what you’ve said here gets at the essence of the situation.

Terraform is designed under the assumption that its runtime will be dominated by waiting for network API calls to complete, and so e.g. that’s why it’s designed to run multiple requests concurrently to try to achieve some pipelining advantages. How successful that will be tends to depend on how “dense” your dependency graph is; in the worst case, of everything in your configuration depends on one API call that takes a long time then there isn’t really anything Terraform can do to avoid waiting that long time. A less complex graph can often achieve better concurrency.

The algorithms used for manipulating the graph itself, in Terraform’s RAM, are operating at such a different timescale than those network requests, and so the network requests tend to become a problem before the graph algorithms do. However, there definitely have been (and probably still are, to some extent) some rough edges where unusually complex graphs have noticable delay in the graph algorithms. Most of our optimization effort goes into the network request concurrency aspect because that’s the most significant impact for most people, but we have made various investments in improving the graph algorithms in recent years and so I don’t know of any significant low-hanging fruit to pick there currently.

One specific area of improvement would be to allow providers to tell Terraform that they are capable of consolidating certain kinds of refresh request together into a single API call to update many objects at once, and we’ve researched that a number of times now. The short story is that it’s a little more subtle a problem than it first appears, both because different remote APIs have different constraints on what kinds of request can be coalesced, and because Terraform Core needs to make sure it doesn’t coalesce “too much” so that the graph traversal is no longer correct. I expect that we will find a good compromise here eventually, but finding a viable separation of concerns for this problem has been challenging, particularly with the need to update all providers in some way before it would become broadly useful.

On data sources and -refresh=false

I believe there are cases where it makes sense to use data source results from previous runs.

If a user has decided to use -refresh=false it is likely they want to minimise the volume of network API requests (maybe for speed, maybe for cost) and has accepted a trade-off that changes outside of Terraform may not be accounted for.

As an example, consider one popular use of data sources: resolving the name of a remote API object to an ID, to refer to it in other resources. Such mappings may often be of a very stable nature. More specifically, let us suppose that a configuration aims to configure GitHub or Vault, and has lots of group names that it needs to look up to opaque IDs. There may be hundreds or thousands of group names referenced in a configuration. It is vanishingly unlikely in most environments that a group would be deleted and replaced with a new one of the same name. The Terraform practioner may want to optimize away a lengthy series of API reads on every single plan that they know to be unchanged - this is perfectly aligned with a major motivation for wanting to turn off managed resource refresh.

I recognise, understand and agree with your point about the current implementation of data sources not having any way to deal with provider schema changes - but that is a reason why persistent data source results would be hard, not that it would undesirable.

On graph processing efficiency

It has been a few Terraform releases since I last tested this, but I have worked with a configuration that aimed to manage thousands of Vault groups. This configuration performed poorly, and it was all because of Terraform being inefficient locally, before reaching out to the network - I validated this by measuring time taken to plan, starting with an empty state. This scaled O(n3) with number of groups, which is pretty bad.

Unfortunately that is just part of working in this problem space. A transitive closure on a n-DAG is equivalent in complexity to nXn matrix multiplication, where the theoretical best case is O(n^2.3729). I experimented with an adjacency matrix-based DAG to get closer to that at one point, but the trade off wasn’t looking worth it based on our usage patterns. (basically O(n^3) is about as good as you will get, and we work on reducing the constants for each node)

There are also a lot of inherently quadratic operations which we need to apply to the graph, and we can sometimes try to optimize around or combine those operations when they become a severe impediment, but optimization often has a tradeoff in maintainability too.

A lot of this comes down to what you have described well, Martin. I am trying to get Terraform to do what it was not architected to do. There are information architecture and freshness and other assumptions that need to be validated.

Clearly, based on the growth of “terraservices” (like Terragrunt) and the comments above by Stuart, people do split up their “big config representing the whole environment” (or some one-step-below subset) into independent parts, and then rely on something to weave them together. Just by running one of those smaller parts, I already am saying, “other parts might have changed, I am ignoring those.” In doing so, they not only bring in another tool, but sacrifice the definition of those higher-level constructs. That leads to a lot of pain.

I was trying to bring those together natively in Terraform, but in doing so, I am pushing against the core of how it is built?

The reality where I’m coming from is that there isn’t the need/desire for a single set of code that describes everything in one place. For us lots of resources are totally independent of each other (for example stand alone AWS accounts that have no connectivity other than to the Internet) or are managed by separate teams/workflows/cadences (some bits of code shouldn’t be widely shared/visible, we want code to live “next to” other types of code instead of in a giant Terraform only repo, we want different workflows for different parts of code).

There are also technical reasons for splitting code into different repos, such as generally being unable to create a thing and deploy things within that thing in the same Terraform root module (e.g. creating a Kubernetes cluster and deploying lots of Helm charts within that cluster).

We have lots of usage of modules (both internally developed and open source external modules) to make things simpler & easier to manage (e.g. centralising certain business logic, allowing for definition and reuse of common application patterns, etc.)

So we have generally found that because of the other reasons for splitting code up into multiple repos/state files/root modules we don’t end up with massive numbers of resources in a single root module (generally no more than a few hundreds or low thousands tops), and therefore we don’t commonly see excessive refresh times.

Thanks for the interesting insights, @jbardin!

One thing I wonder about though… in the example I was referring to, my configuration only contained three resource blocks - the volume was all due to for_each, and I thought that the instances of for_each didn’t participate in the dependency graph as first-class members, as they all have the same connectivity to other blocks. Perhaps there’s an opportunity there, to optimise away some processing?

@maxb,

During plan the resource block is mostly handled as a single node, but during apply each of those individual instances will be separate nodes in the graph because they need to be processed individually. Keeping subgraphs like expanded instances more “self-contained” during apply and expand them on-demand like during plan is something I’ve toyed with, but nothing jumped out as a good optimization that didn’t require other major changes.

The biggest problem with large numbers of expanded instances is often the data handling when referencing those instances, rather than the graph structure. Because the Terraform language treats a resource as a single value which contains all of it’s instances, references to the individual instances always requires passing around the entire value of all instances to index into: Performance issues when referencing high cardinality resources · Issue #26355 · hashicorp/terraform · GitHub

Ooh … That is interesting! I think I finally have the true answer to why my configuration back then was performing poorly :slight_smile: