Terraform module rewrites

Back when our project was using Terraform 11.x, our module design was very linear.

For instance, every time we assigned a new IAM role, we would declare a new module to do so, and we had a resource defined that would provision 1 IAM role.

Our end result was that we now have N amount of modules for N amount of role assignments, e.g

module "iam_user1" {
   // stuff to do first role assignment
}

module "iam_user2" {
   // stuff to do first role assignment
}

module "iam_user3" {
   // stuff to do first role assignment
} 

// and so and so forth...

Now that we have switched over to Terraform 12, we want to delete the old module declarations, and create a module that will call a resource which will utilize the for_each meta-argument, and create however many IAM roles needed with one module.

module "iam_all_users" {
  // pass in a map(map(string)) and have for_each do its thing
}

I’ve basically implemented the second variant (using for_each), but a new problem has arisen:

The problem:

When running terraform plans, it’ll show the new module that employs for_each is going to create however many resources, and that’s great. I’ve simply just moved the IAM declaration from a bunch of modules to one.

The problem is that once I delete the old modules from the terraform source code, the plan will now show that it also wants to delete all of the old modules in the plan, which if I’m not mistaken, is going to essentially end up being a destroy followed by a re-create.

Basically, my concern was the nature of the destroy / recreation when re-writing modules in this way. I’d like to do this for many types of resources (not just IAM but also things like instances, disks, and so on and so forth), and can see some complications if everything ends up getting destroyed and then re-created.

Is there some way to rewrite modules in a way similar to how I’ve described above but not have to re-create everything, or is the nature of this kind of re-write intended to produce this kind of consequence?

Hi @mikek,

It sounds like you’ve effectively moved your aws_iam_user resource instances from lots of separate iam_userN modules into a single iam_all_users module, and so Terraform needs some help to understand how to map them. You can tell it using terraform state mv, which will directly edit the state so that an existing object is now tracked at a new address.

For example:

terraform state mv 'module.iam_user1.aws_iam_user.this' 'module.iam_all_users.aws_iam_user.all["iam_user1"]'

The ' characters are important to ensure that the quotes in the second argument get passed to Terraform literally, rather than being interpreted by your shell. That quoting style assumes a Unix shell; if you’re on a Windows system, omit the ' characters and instead place a \ before each " to escape it.

It sounds like you have several of these to do, so you may want to write a small one-off script to systematically migrate all of your existing user instances. Note that terraform state mv edits the state directly, so in a collaborative environment it’s best to ask anyone else who might also try to use Terraform against this workspace to pause their Terraform usage for the duration of your work, just to ensure that another operation doesn’t interrupt you and cause an inconsistent result.

Thanks @apparentlymart!

A quick follow up question about terraform state mv - would this command work even if my one-for-all module hasn’t been applied/created yet, or would I have to use terraform import in conjunction with terraform state mv?

The reason I ask is because I haven’t applied anything yet, and I’m guessing if I was required to apply my new module (and not delete the old modules until the process was finished) I’d end up with a 4xx error from our cloud provider stating that those resources already exist.

Conversely: If I don’t have to apply anything, and can just use terraform state mv command, then if I am understanding correctly - all I have to do is make the appropriate changes in my terraform declaration, and then run the terraform state mv command, and after that’s done I should have clean plans throughout, right?

The last thing that I’d like to just clarify (for myself) is in a scenario where a mistake was to be made using state mv. The state file I’d be modifying is remote, so could I use state pull in order to preserve the state as it was before my modifications?

Edit: There’s one more thing, and I’ll try to provide more context on this one (but it might be more a question for the provider): would a state mv also work between different kinds of resources? The reason I ask is we used to have google_project_iam_member but re-wrote IAM using google_project_iam_binding (https://www.terraform.io/docs/providers/google/r/google_project_iam.html), and I’m worried it wouldnt work since one takes a member (string) argument, and the other one takes a members (list) argument.

In principle it should let you move between modules but I think there may be a known bug in current releases that causes that to be treated as an error if nothing is currently in those modules at all. If you run into problems, you might need to initially establish the new module by terraform apply without the user resource inside (e.g. comment it out briefly), just to create the empty module skeletons in the state, and then move into the state. Terraform ought to allow moving into a module that isn’t in the state if it also exists in the configuration, but I don’t recall off the top of my head if this bug was addressed already.

You can indeed use terraform state pull to obtain the latest snapshot before you start, and then terraform state push --force to write that back into place afterwards if you want to back out of the change. You can also terraform state mv individual resource instances back to where they started, for a less extreme rollback.

terraform state mv can only move between resources of the same type, because the serialization of the resource instance data in the state depends on the resource type schema and there’s no mechanism built into Terraform to generically convert between two resource type schemas. I’m not familiar with the Google provider resource types you mentioned specifically, but if they are related in such a way that migrating between them makes sense then you may be able to do that with terraform import and terraform state rm, being careful to ensure that you do not leave any situation where two resource instances in the state believe they are managing the same object.

Thank you for the clarification! I haven’t really used state rm previously, so would I also want to rm instances of null_resource tied into the actual resource, or are those safe to destroy with normal applys?

Since a null_resource object exists only in the Terraform state, and isn’t associated with any remote object, it might be fine to let Terraform recreate it.

However, often null_resource is used as a container for some side-effects from a provisioner. If that’s the case in your situation then the decision will depend on whether it’s okay for those side-effects to be re-run as part of your refactoring here. If you don’t want to re-run it then you’d need to use terraform state mv to move the placeholder object that represents the null_resource having already been created over to the new resource address, so Terraform will see it as being already created.