Concept/Approach of separation of environments and/or workloads

So, we are buildung currently many terraform code to be able to deploy the resources we would like to have, in our case to Azure but that doesn’t really matter.

From high level, we will have, at least tree environments:

  • prod
  • dev
  • test

as we don’t expect, that the environments will consists of the same set of resources, we have create for each environment a dedicated root configuration and decided to not use terraform workspace for environment separation.

To reduce code logic duplication, we have further more created two types of modules

  • resource-modules = our own implementation of a dedicated resource to meet our requirements with just a single resource in
  • deploy-modules = compose the resource modules together to handle a complete use case like “deploy network”, “deploy app1”, “deploy vm”…

Now we are able to call the (deploy-)module from the environment root configuration.
Which is working as expected, so far, so good.
We know, that we can now add as much as required deploy-modules to the root configurations to get the things up and running in each environment.

How ever, we would, for example, never be able to remove a dedicated workload from an environment with terraform destroy as that would kill the whole environment instead of a dedicated workload.

So we are wondering, what would be the most efficiant approach to go for to have

  • as much flexibility as possible
  • avoiding code duplication as much as possible
  • keeping the code clean/understandable as much as possible

Any suggestions from real world?
Thanks

Hi @joerg.lang,

The terraform destroy command is intended for entirely destroying a transient environment such as a short-lived development environment created only to test a single change. I don’t think that command is relevant to your situation because you are describing environments that will stay up indefinitely and will be modified over time rather than being entirely destroyed and replaced.

The normal way to remove something in a long-lived environment is to remove it from your configuration and then run terraform apply. Terraform will then notice that there is an object tracked in the state which isn’t mentioned in the configuration and so will propose to destroy it.

When I say “remove from configuration” in practice that could mean a few different things, including:

  • Literally removing a resource block.
  • Removing a module block, which implicitly removes everything declared inside it.
  • Designing your configuration to use count or for_each arguments to dynamically declare multiple instances and then changing the values used so that there are fewer instances of an existing resource even though the block is still present in the configuration. (The most extreme case is setting count to zero or for_each to an empty map, in which case you will effectively be declaring no instances of that resource at all.)

Your description of “resource modules” sounds like one of the situations that the documentation recommends being cautious about in When to write a module, so I’d suggest reviewing that section if you didn’t already to decide whether that advice applies to you. There is no hard rule against single-resource modules but we’ve seen folks overuse them in the past and make their configurations harder to maintain over time.

1 Like

Many thanks for you thoughts @apparentlymart .

I don’t think that command is relevant to your situation because…

may be, may be not.
Especially for the “non-productive” environments, it could be a use case to drop some of the “workloads” after the office hours and rebuild them bevore office hours to save some money. But as mentioned, possible only some, and not all of the workloads, while then terraform destroy wouldn’t help, only in case the whole environment should be removed.

  • Literally removing a resource block.
  • Removing a module block, which implicitly removes everything declared inside it.

That is currently the way, how we are doing it right now within manually testing.

Your description of “resource modules” sounds like one of the situations that the documentation recommends being cautious about in When to write a module, so I’d suggest reviewing that section if you didn’t already to decide whether that advice applies to you.

I already know the paragraph you a referencing too and it was a overall discussion with myself if we would go for that or not. :rofl:

Within the resource-modules we are encapsuling things like

  • naming convention
  • “hard coded” default settings
  • default tagging

I agree, that with the “deployment-modules” we can have the same encapsulation in, but from thinking what if a deployment-module would be too “overengineered” as a single instance of a resource is required.
But we will rethink about it.

I think one notable difference in your approach compared to a typical approach is that you seem to be describing a number of separate components (or “workloads”, to use your terminology) together in a single Terraform configuration, even though they have an independent maintenence lifecycle. That is, they tend to change separately rather than together.

My usual suggestion would be to make a separate configuration per distinct component and then connect them together as necessary using data sources. In that structure each configuration is relatively small and so the scope of changes is limited and so changes are less risky. In your particular case, it also means that you could run terraform destroy just for one component without affecting any others, as long as you respect the dependencies between your components so you don’t disable something that another component is using.

Since you are also using separate configurations per environment in your case that means a separate configuration per environment, per component.


It is also technically possible to maintain everything together in a single configuration, although with that structure everything must always be managed together and so it would not be appropriate to use terraform destroy.

However, you can write a configuration which allows each component to be activated separately by input variables and then implement your scheduled downtime by automating a change to the input variables followed by terraform apply.

For example:

variable "active_components" {
  type = object({
    workload1 = optional(bool, true)
    workload2 = optional(bool, true)
    workload3 = optional(bool, true)
  })
}

module "workload1" {
  count  = var.active_components.workload1 ? 1 : 0
  source = "..."

  # ...
}

module "workload2" {
  count  = var.active_components.workload2 ? 1 : 0
  source = "..."

  # ...
}

module "workload3" {
  count  = var.active_components.workload3 ? 1 : 0
  source = "..."

  # ...
}

In your scheduled job which disables some components outside office hours, you could generate a components.tfvars.json file like this to disable whichever components need not be running, and pass that as terraform apply -var-file=components.tfvars.json:

{
  "workload2": false
  "workload3": false
}

…and then in the matching scheduled task in the morning you’d make it run terraform apply without this extra file so that the components attributes will all default to true again and so they will all be reactivated.

If the risk of constantly updating the entire environment is okay for you then this approach might be a good compromise. You mentioned doing this only for the non-production environments and so it does seem like it might not be a big problem if there were a mistake or malfunction that caused a problematic change to one component while updating another. I would be more nervous about using this approach in production.

Sorry @apparentlymart for my late response and again, many thanks to your thoughs.

Yeah I think that describes more precise my tgunkings if it’s possible/usefull to splitt things into smaller parts.

So to make the example more clear

  • core: Network and other core services
  • shared-service: shared ressource for any/some workload
  • shared-service2: shared ressource for any/some workload
  • workload1: 2 vm’s, 1 storage account, 1 sql-db, 1 application-gateway
  • workload2: 1 aks, 3 storage accounts, 5 postgressql db’s

So that data sources of :

  • core and shared can by used by worload-configurations
  • workload shouldn’t be used by other workload-configurations to keep the workloads (or components) isolated as much as possible.

really nice approach, I will keep that in mind.

we have versioning in place for the code, so I would think that “updateing” will just be “drop/recreate” of the same version, but I understood your point.

Not sure if this would help, but “tarraform destroy -target resource.name” can help you remove some resource and all it’s dependent resources without change of code, when you want to restore, you just do “terraform apply”…