Terraform Module and state - general question

Hello,

Need some help or guidance

sequence of events

Step1)

Provisioned aurora RDS cluster with below TF automation using the same workspace.
First module creates an aurora PostgreSQL instance
Second module creates a database accounts based on RDS endpoint as input from the first module.

main.tf
module "aurora-postgresql" {
	  source  = "aurora-postgresql/aws"
	  version = "6.0.3"
               /*Some input parameters*/
              }

module "postgre-setup-db-accts" {
	  source = "postgresql-setup-db-accts/aws"
	  version                    = "1.1.5"
          cluster_endpoint           = module.aurora-postgresql.rg_write_cluster_endpoint_alias
                /*Few others input parameters*/
	} /* using  cyrilgdn/postgresql provider */

In the end i will have a postgreSQL cluster with required DB accounts for application deployment.

Step2)

I must rebuild the RDS cluster due to an issue either from a backup snapshot or restore to a Point in time(PITR).

Step 3)

When I build a new cluster from the snapshot or PITR (any builds from Snapshot/PITR will builds a new cluster)

As database accounts already exist in the restored DB, my build is going to fail as the second module (postgre-setup-db-accts ) is trying to create the accounts again as part of deployment based on the current tf state file.

Tried refresh by running the below command with an assumption the state file will get updated with the current state of infrastructure but that is not working either.

terraform apply -refresh-only -var-file=input_vars.tfvars

How do we address these types of situations ?

Regards
rx

Generally a situation like this needs to be addressed in one of two ways:

1) Allow for a complete opt-out of creating those resources

For example, by using Terraform input variable you can set, which is used in a count expression on the module, setting it to 0 or 1 depending on whether it is needed.

This is easy to set up - but means you need to manage any further changes to the accounts outside of Terraform.

2) Import the resources

Importing (terraform import) is the process of telling Terraform that a particular resource in the configuration corresponds to something that already exists.

It’s useful… but kind of cumbersome.

First, it’s up to the provider implementation to decide whether to support importing at all. Many do though.

Then, the format of the string you need to use to identify the existing resource is again entirely up to the provider. It looks like the postgresql_role resource just uses the name of the role though, so that’s simple enough (Terraform Registry)

But now, you (or code you write) has to gather up all the pairs of resource addresses and (in this case) role names, and run an individual execution of terraform import for each one. Which is slow, if you have many.

And if you’re using remote state, the entire state is downloaded and uploaded for each one.

And if you’re running Terraform from some kind of CI, or Terraform Cloud/Enterprise, you need to locally replicate all the variables and credentials to run your imports.

And, whilst I’ve never worked with cyrilgdn/postgresql, it’s usual for the provider to want to contact the remote infrastructure (in this case, database) at import time, to read the details of what exists - meaning you’d have to get your restored database online, have the initial account provisioning fail, and then sort out the imports.

And, once you’ve imported resources, do make sure to read the next Terraform plan carefully, as if there is mismatch between the defined-in-config state of the imported resources, and their actual state, Terraform will plan to “fix” them. Sometimes this is what you want - sometimes it really isn’t.

In summary

Terraform shines when it is being asked to stand up complex API-driven infrastructure from scratch… but struggles when asked manage more complex operational tasks that step away from a CRUD lifecycle.

I’d probably be looking to migrate my database account management outside of Terraform, given this set of requirements.

Hi @rcg,

I haven’t been in exactly the situation you described but I have faced a similar problem with Amazon RDS database snapshots. With those restoring a snapshot really means creating a new instance using the snapshot, which seems similar to what you are trying to do with creating a new database from a backup.

When I did that I relied on the fact that because my database was restored from backup is was configured and populated similarly enough to the original that Terraform could consider it to be the same object, without needing to recreate it or anything inside it. Given that, I followed the following steps:

  • Start a new instance from the snapshot manually using either the AWS CLI or web console.
  • Tell Terraform to forget about the old instance using terraform state rm. This removes the binding between the resource instance in Terraform and the real object in the remote API, so Terraform no longer knows the old instance exists but it hasn’t been destroyed yet
  • Use terraform import to bind the new RDS instance to the same resource instance address the old one had, so now Terraform will think it created the new instance and will manage it instead of the old one moving forward.
  • Run terraform apply to propagate forward any minor changes from new to old, such as the hostname of the database instance. If you are exporting that information through an output value for example then this will update the output value to match the new instance.
  • Manually destroy the old instance that is no longer in use.

As long as the snapshot really does contain all of the objects previously created, Terraform’s “refresh” step should find them during terraform apply and conclude that they don’t need to be recreated.

Of course this has some manual steps and so isn’t really suitable for routine workflow, but for me this was an exceptional disaster recovery situation and so it was acceptable to just describe it in a runbook rather than automating it fully. If you will be frequently restoring this database from backup then Terraform might not be the best tool to manage it, because Terraform lacks the lifecycle concepts needed to describe this sort of change in its “desired state” model.

Thank you @maxb @apparentlymart for the great inputs. I probably going to end up taking our accounts management outside of TF

However, I tried the below approach and it seems working for my above use case. But just wondering if that’s right

if place my initial Day 1 Build (DB and Accounts) in 2 TFE workspaces

  1. Workspace 1 – Hosts the state file with Aurora PostgreSQL Module resources to build DB
  2. Workspace 2 - Host the state file with DB accounts Module resources.

As part of initial setup, I will run both modules to get my database and accounts together.

When I need to restore the database from SNAPSHOT/PITR, I will rebuild the instance as part of workspace 1. Post completion restore from SNAPSHOT i will have database with the required application DB accounts

When I run my accounts module as part of workspace 2, I was able to avoid the error of duplicate accounts as the accounts module detected that DB has already existing accounts and TF state reflected the same

So far with split of workspaces, everything seems to be working as expected.

However, I am not sure how to pass the variables between workspaces meaning outputs from module 1 in Workspace1 to accounts module in Workspace 2 ?

Do we have any best practices/Guidance when to sperate the modules into workspaces ?

Regards
Rx