Bulk moving parts of your state from one state file to another

I’m undertaking a refactor where I’m splitting up a large, monolithic root module into multiple smaller root modules. To achieve this without destroying and recreating all of our infrastructure, which would incur downtime we’d rather avoid, it seems like this is going to require a LOT of terraform state rm-ing from the source module and terraform import-ing into the destination module(s). I haven’t counted, but I suspect I have hundreds, possibly 1000+, resources to move.

It seems like I should be able to write a script to do this, except that terraform import seems really hard to script, because the key that you need to import things differs from resource to resource, and can’t be determined programatically. For AWS, some things want an ARN, some things want an ID, others want some combination of attributes joined with underscores. I filed Feature request: All resources should export their import key as a computed attribute · Issue #29666 · hashicorp/terraform · GitHub thinking that it would help others trying to do something like this in the future.

I’m starting to consider working directly on the state file itself. If I can use jq or something to grab the relevant pieces of the state directly from the source state file to build the dest state file, then upload it to my remote state backend with terraform state push, that might be easier than doing all of these fiddly imports.

Is there a different way that I could tackle this that I’m not seeing?

For large moves we modify the state file directly (state pull/push) but you have to be very careful. With state rm & import there are lots of checks to ensure you don’t mess things up, but with direct editing of the JSON it is a lot more easy to cause issues, so it isn’t advised to try unless you really know what you are doing.

I do share the general sentiment that editing state snapshots directly must be done with care: Terraform relys on this artifact heavily for its primary purpose, so we must be careful not to drop or corrupt information that Terraform relys on.

The following are some notes I’d suggest to keep in mind if you do want to attempt it:

  • Terraform expects a particular snapshot in a series (identified by the lineage and serial properties together) to be immutable after it exists. Different backends have different abilities to preserve that invariant, so you might be able to “get away with it” for some of them, but for robustness I’d suggest making sure that any software you write to manipulate snapshots always increase the serial as part of its work, so Terraform can clearly see that it’s intended as a subsequent snapshot.

  • For backends that support locking, Terraform normally holds a lock on the state while working with it, which is sufficient to avoid race conditions as long as all clients are cooperating with the locking scheme. terraform state pull followed by a later terraform state push typically doesn’t respect that locking scheme, so you’ll need to find some other way to avoid concurrent processes making decisions based on the previous snapshot while you’re working. (In many cases this can just be human process: tell your coworkers to leave the repository alone while you are working. But the details will depend on how your team typically works, of course.)

  • You can be generally okay if you only move things around at the level of whole resources in the state and keep the object describing each resource unmodified. However, there is one important exception: Terraform tracks each resource instance object’s dependencies as they were at the most recent apply, just in case you subsequently remove that object from configuration and so Terraform can’t rely on the configuration to infer that dependency information.

    Since dependencies are context-specific, you may need to do some vaguely-defined work here to keep the dependencies rational. Alternatively, you can generally get good enough results by stripping out the dependencies altogether and then using terraform apply -refresh-only after you’ve pushed up the new state, as long as your new configuration alone is enough to represent all of the necessary dependency relationships. If any destroy actions appear in terraform plan after you upload the new state then you should resolve that first.

    (The above is unfortunately not 100% sure to work, because the refresh phase itself also technically wants to take into account dependencies from the state, but it’s often okay because reads are usually side-effect-free and thus it doesn’t typically matter what order we do them when the previous state snapshot already represents a fully-converged state.)

  • Terraform’s design assumes that each remote object is only bount to one Terraform resource instance in one Terraform configuration at a time, but Terraform itself can’t verify that assumption.

    This means that you will generally need to make sure that when you are done there aren’t any existing objects which you’ve copied into more than one of your new state snapshots. If you don’t ensure this, it’ll probably look like it’s working just fine at first but then the first time you change one of the multiple configurations you’ll see the other ones fight to restore the object to its original settings. (In principle you could try to change them all in lockstep, but that’s a big mess and not something I’d generally recommend. Even if so, you’ll see the “Note: Objects changed outside of Terraform” message as Terraform’s best attempt to alert you that this might be happening.

  • Whatever else you do, I suggest making sure you have a backup of the last-known-good state snapshot that Terraform itself created, so that you can back out and return to a working state if you find yourself in a corner you don’t know how to iterate out of. Then you can regroup and have another attempt at a later time.

The main thing to note here is that while state is important, there’s also nothing magical about it. As long as you preserve Terraform’s expectations about the structure and semantics (and I tried to cover the main cases in the list above) and let Terraform have an opportunity to possibly make some corrections before you do anything drastic like destroying objects, this can be a viable strategy for a one-off refactoring job like this, where you can hopefully control for any particular quirks about your specific target environment. (A general, repeatable process would be harder, of course, and hence why Terraform itself doesn’t have any built-in features for this yet.)

Thanks very much for the detailed reply!

I think I actually see an easier way to do specifically what I’m doing right now. I can terraform state pull the state file for the source module and terraform state push it as an exact copy to every new dest module. Then I can use terraform state rm to trim out the things that shouldn’t exist in the state file(s), instead of trying to copy over the things that should exist. The existing suite of terraform state commands should be sufficient for that and this approach in general seems a lot safer.

Indeed, if none of your resource instances will have changed addresses (different module paths, different resource names, different instance keys) in the new split structure then that does seem like a fine compromise, as long as you take care to ensure that all of the new states end up disjoint from one another, having no resource instances in common.