Vault HA member replacement - dynamoDB storage

Hi there, I’ve inherited a vault cluster (free/opensource version not enterprise) that’s setup as:

  • DynamoDB storage backend
  • HA enabled storage and API
  • Two EC2 node HA pair behind a load balancer (the API is set for the LB’s cname) and a target group

Our EC2’s for this are both too large, unencrypted at rest and running an almost EOL version of Ubuntu. So ideally, I’d like to replace both nodes with new nodes that are a better fit. But I’m unsure about two things and would love some clarity:

  1. We have daily dynamodb backups and can fire one off manually as needed, in this setup is this all that needs to be backed-up/restored to replace the cluster, providing there’s a vault process that’s pointing to it? Like if disaster struck and both nodes were terminated, if I created new nodes with the same vault.hcl configuration file pointing to the same dynamodb table, would it work (providing I had the original unseal keys)?

  2. Currently 2 node HA pair, could I simply create a new instance with the same vault version and vault.hcl file, pointing to the current dynamodb backend, would this become a second standby server then? Could I then add another and start turning off the original 2 instances to fully replace the instances with new ones yet maintain the current cluster’s secret data?

Digging through docs all morning but there’s so many different configurations for storage, free/enterprise I’m not finding this exact scenario. Maybe it’s just that easy and that’s why it’s not glaringly documented…

  1. Yes.

  2. Yes.

I agree the documentation isn’t great on the detail.

I’m only answering confidently because I’ve learnt the Vault architecture through experience and reading the source code over the last few years.

1 Like

Note: the vault operator step-down CLI command / sys/step-down API allows you to signal the current active node to give up it’s active status, allowing another node to take over.

This may be useful for testing failover without needing to restart nodes.

1 Like

Thanks @maxb - the nodes do handle failover correctly, but good reminder of the command to tell the current active to no longer be active.

So then in theory I can just build up some new nodes from scratch, keeping same vault binary version and same configuration file. Start vault, they’ll join as standby members to the cluster. Ensure they’re in the target group list for the load balancer and ensure they’re unsealed. Then stop the original standby vault service, leaving just the old active and two new standby’s. Step-down the current active, a new standby should become active and can test, if all good stop the binary on the remaining old instance and remove them from the target group, done.

Appreciate the confirmation!