Best practices of Terraform staging testing

I am looking for some advices of terraform testing, before firing up terraform-apply in prod env. What we are having now are code-review and terraform-plan checks, these are not enough sometimes as terraform-apply could still fail on prod for many reasons (permission, cross-account config, and etc).

Is there any good practice to test the code on staging environment ? Given that our networking infra is huge and complex, duplicating the entire prod to staging just for the testing purpose is probably not an option yet for us.

Hi @xike41,

This is a broad question so it’s hard to give specific advice without being able to see all of the details of what you have in your production environment.

However, the high-level proposal I’d make here is to decompose your infrastructure into smaller parts, using separate Terraform modules with well-defined concerns and interfaces between them. If you do that, then in principle you can apply an individual module alone in order to test it, possibly including some temporary minimal foundational infrastructure for it to use as dependencies.

A pattern I like to follow when building shared Terraform modules is to create a subdirectory test under the main module directory and place in there a Terraform configuration that declares a minimal set of objects that the module needs and then calls the module with those dependencies.

If we use the “consul_cluster” module from the second example on the Module Composition docs page as an example, I might put this test directory at modules/aws-consul-cluster/test and put a file test.tf in there with content like this:

provider "aws" {
  region = "us-west-1a"
}

resource "aws_vpc" "test" {
  cidr_block = "10.1.0.0/16"
}

resource "aws_subnet" "test" {
  vpc_id            = aws_vpc.test.id
  availability_zone = "us-west-1a"
  cidr_block        = "10.1.1.0/24"
}

module "consul_cluster" {
  source = "../" # test the module in the parent directory

  vpc_id     = aws_vpc.test.id
  subnet_ids = [aws_subnet.test.id]
}

Now when we want to work on the implementation of the consul_cluster module, we can follow the following steps:

  • cd modules/aws-consul-cluster/test
  • terraform init
  • terraform apply before making any changes, to give you a baseline for what the current production environment ought to look like.
  • now make the changes you need to make to the code in the modules/aws-consul-cluster directory.
  • terraform apply to see what effect those changes have on the existing temporary infrastructure.
  • Keep iterating on the previous two steps until you get the result you want.
  • Before you submit your changes for review, run terraform destroy and terraform apply to make sure that the updated configuration can still create a new copy of the infrastructure from nothing.
  • Run terraform destroy to destroy the temporary test infrastructure.
  • Submit your changes for code review.

If you’ve decomposed your infrastructure into small enough pieces then it should hopefully be possible to write a small test configuration like this for each module so that you can test that module alone (using a minimal set of dependencies) rather than testing everything together. Of course, the details of how exactly to do this and how much supporting infrastructure you’d need for a realistic enough test will depend on the details of the module in question.

In teams I’ve seen using a practice like the above, they tended to do it in addition to having a staging replica of the production environment to rehearse changes in, so that the small test configurations can be used as a sort of “unit testing” while the staging environment is for “end-to-end” testing. However, if the staging environment is not practical for you then hopefully you can still use the small test configurations to give you some increased assurance that your individual modules are correct before attempting to apply them in production.