I would say that learning when / how to DRY stuff up and / or use modules is something you may develop instincts for over time. So, it may be smart to import stuff “resource-like” (i.e., just defining all the resources as they are) and refactor into modules later. If you have a lot of similar things, you can also do iteration directly in your terraform code (with for_each), without fully getting into creating modules.
Importing existing resources (that weren’t created with Terraform originally) has its own set of challenges, so this is another argument in favor of having your first pass be just getting the stuff into Terraform at all.
But, as you start importing stuff, you may see things where there’s a lot of repetition, and this may be a good place to start looking at using modules in one of a few ways
- Module that can be instantiated once per actual resource
- Module that packages up a bunch of related (different) resources
- Module that uses a data structure (object, map, or set/list) passed into create a bunch of similar resources.
(you can also look at third party modules, which may be useful for some things, though may also have builtin complexity and configurability that your use case won’t demand)
I put some thoughts on modules broadly in this message:
This is probably one of the hardest things about using Terraform properly, and there isn’t a one-size-fits-all approach. Generally, I would say start somewhat flatter, and work on breaking things up more as it becomes risky or slow to apply changes. Even within that flat state, you can still use file names to help keep resources grouped together (for example, by region, application, or resource type, depending on what makes the most sense for you).
As states becomes more partitioned, you have a smaller blast radius, and can plan / apply things faster, but you introduce a couple of new problems
- “drift” becomes easier to come up if individual states are planned less often
- You now have to use outputs / remote state references (or data resources) to pass things defined in one state to another, or use other workarounds (like hard-coding) if you’re referencing things within one state from another.
- You are also more likely to run into circular issues in terms of the order things need to be applied in, or needing to apply one state before being able to reference it from another.
The new “stacks” feature is probably worth a look. And, especially if you’ve got dev / prod / etc. environments in separate accounts or VPCs, and are stamping out similar resources within each one, looking at wrapper tools like Terragrunt is probably also worth it, even with the slight risks of adding another layer of abstraction.
If you have a really clear boundary in terms of “all these things belong to x team or y application”, you could look at breaking up state vertically that way.
In my experience, the more common case is that the boundaries are a little fuzzier, in which case, you might want to do nested states like account_id/region/vpc, or grouping related resources together in an “onion” model, with more foundational layers applied first. For example, 01 would be applied before 03, and you’d try to avoid having something in 01 reference something created in a higher numbered layer, but might frequently reference something (e.g., a VPC ID) created in a lower-numbered layer from a higher-numbered one.
aws/account_id/01_network – this is the most foundational
aws/account_id/03_storage - might contain s3 buckets
aws/account_id/05_database - might contain RDS instances
and so on. The idea here is that you’re (where and how to define IAM permissions then becomes another complicated situation).
If there are a lot of things that are shared by the various accounts (e.g., a DNS zone with lots of records) or with things that have a lot of relationships (for example, defining a bunch of VPCs / networks, and creating peering connections between them, or defining permissions that cut across accounts), I sometimes do a meta/ or shared/ directory and state structure.
I would suggest avoiding versioning modules to start with (for this size of environment), and keep all your configs / code in a single repo (but across multiple states), and matching the prefixes for the state to your repo’s filesystem layout.
One thing is that, while you could have a period of transition while some things are managed by Terraform and others not, you really want to avoid mixing clickops with IAC. So, maybe you focus on defining and importing some of the foundational items (like VPCs) first; this will build your experience and comfort with the tools, and maybe give you some more ideas about how you want to structure things, and reduce the effort involved with those refactors.
Similarly, some things may make sense to not manage with Terraform (for example, a cloud function that gets deployed by a CD system)… this can make sense sometimes, but try to avoid managing the same thing in two places, i.e., if you’re managing the resource via another tool, in most cases, avoid managing it with Terraform.
Esp. if you have more than 2-3 people, don’t be too afraid to ever do state surgery or do local applies, but I do strongly recommend finding something (whether it’s a tool like Atlantis, a TACO provider like Spacelift, or a simple homebrew CD pipeline) to make sure that most of your changes are applied in a standard way, and from some sort of pipeline vs. just local.
Setup good validation checks on your code and formatting (including using tools like tflint) in CI, as well as using pre-commit hooks, to catch mistakes / bugs earlier rather than later, and to help with overall readability and code quality.