Upgrading AWS RDS instance size, should I be using Terraform?

I need to upgrade my RDS instance size to a larger instance size.

So my terraform setup currently deletes all state, and runs terraform plan and apply from scratch each and everytime (that’s how are deploys work currently).

I want to upgrade the instance size, any tips? Should I just use aws console for the upgrade and then update my terraform?

I’m guessing I should use terraform, any tips on making sure my instance doesn’t get deleted and I lose my database?

(I will be testing this in a test environment first ofcourse).

Hi @ooh,

If you aren’t retaining state between runs then I unfortunately I think you are already using Terraform “wrong” in some sense of the term – not to say that you can’t do it if that’s what’s working for you, but that’s not how Terraform is typically used and so best practices you might learn from other people probably won’t apply to your unusual workflow.

If you aren’t retaining state between runs then you can only use Terraform for its “create” actions, because all other possible actions require Terraform to have a prior state to compare the configuration to.

If you are able to switch to a model where you are preserving state between runs then the key part of your question is how to avoid applying a plan that would destroy the database. There are various different techniques you can use for this which I’ll list in decreasing order of preference and try to explain the characteristics of each so you can understand why I ordered them this way:

  1. Run Terraform using IAM credentials that don’t have access to destroy the database. I think that means at least removing access to call rds:DeleteDBInstance, but there are probably some other destructive actions other than totally destroying the database that you’d want to block too.

    This is the best option because it means that the only way to accidentally break the database would be to intentionally run Terraform with elevated credentials. The restriction is enforced in the remote API rather than in your Terraform orchestration, so this is the strongest assurance possible.

  2. Run Terraform in an automation environment and generate a saved plan using terraform plan -out=tfplan before applying it using terraform apply tfplan. By separating these two steps you can insert arbitrary logic of your own in between, such as using terraform show -json tfplan to obtain the JSON description of the plan and then running a custom script of your own to enforce whatever rules you want.

    In your case you could, for example, enforce a rule that it’s forbidden for any resource of type aws_db_instance to have the "delete" action. If you find such an action then you’d abort the operation and block anyone from applying the plan.

    This is the second best answer because it still gives fine-grain control over what is allowed and what isn’t allowed, and so you can customize the details to suit your workflow. However, it does mean that you’ll need to run Terraform in an enforced automation environment because otherwise someone could just forget to run the policy check and apply a harmful plan anyway. (I would typically recommend that anyone using Terraform “in production” should be doing it in automation, but I’m not sure what stage you are at yet and whether this would be prohibitive for you.)

  3. The final and least preferred option is to use the prevent_destroy argument inside the resource "aws_db_instance" block’s lifecycle block. This option has a poorly-chosen name and should really be called something like “prevent replace”, because what it actually does is cause Terraform to fail planning for an error if the required change is to replace any instance of the resource in question.

    This option is handy in that it doesn’t require anything other than Terraform configuration, but that’s also its downfall: it’s easy to accidentally remove the setting and end up applying a harmful change anyway, and it’s also very coarse and doesn’t let you implement any detailed rules about what sorts of changes might be allowed or disallowed.

1 Like

It seems even if I do #1 which prevents destructive actions for specific resources, I would still need to do #3 to actually update the RDS instance type correct?

Both #1 and #2 are to prevent making an accident, but #3 is the way to actually upgrade the RDS instance type… is this correct?