Recover vault cluster with raft storage

jr200 · May 1, 2020, 11:38am

I may have misunderstood how recovery works with raft storage.
Here is my setup:

vault version: 1.4.0 (also tried with 1.4.1)
3 node HA vault cluster, all nodes unsealed
storage is raft integrated storage (recently migrated from etcd)
several snapshots have been taken

issue: if I reboot all nodes simultaneously, I can’t get my cluster back into a working state. The only option available to each node is to either create a new raft cluster, or join an existing raft cluster.

Initialising a new raft cluster, creates new keys, and it seems I therefore cant use my snapshots. There is also no existing raft cluster to join, since I intentionally took it down.

how does someone recover from this kind of failure?

jr200 · May 5, 2020, 1:39pm

An update… So the issue I had wasn’t raft related, but rather docker-swarm related. I had been using local volume mounts in my docker-compose file, which are not persisted when the stack is taken down. My fix was to use a bind mount from the host machine to the container with the correct uid/gid owner on that folder.

After I got this working, I tested a number of DR scenarios. It seems the easiest way to recover from a full scale cluster failure, is by seeding a new cluster using the output of a vault operator migrate run.

jr200 · May 8, 2020, 12:20pm

A bit more information, since it took me a while to figure this out from various github issues and documents. (main github thread is here: https://github.com/hashicorp/vault/issues/5683)

As of Vault 1.4.1, the only way to get a consistent backup of vault (using raft integrated storage) is via the migrate command. My steps are:

Take down a vault node in the cluster.
Run the vault migrate command.
- note: if using source=raft, destination=s3, the s3 backup is uncompressed.
Bring up the vault node and unseal it.

Using s3 has the nice bonus that you can point a new vault server directly to the s3 bucket you used for your backup. You could also migrate from the s3 bucket to another storage destination and seed a new cluster.

You can’t seed a new cluster using the snapshots from the vault operator raft snapshot save command. But, I think you could use snapshot in combination with migrate. i.e., an occasional migrate (e.g., whenever you rekey your cluster), followed by regular snapshots which don’t require taking a node down.

(Please correct me if anything I’ve said is wrong or not sensible).

mikegreen · June 23, 2020, 8:00pm

You’ll force the snapshot to restore into the new cluster, then you can use your existing unseal keys. This might help someone who comes across this in the future: Backup - Restore - #2 by mikegreen

jr200 · January 29, 2021, 9:07am

I got round to testing this today (using a snapshot from Vault 1.4.1 and a new cluster using Vault 1.6.1). My steps were:

install a fresh copy of vault on a new machine (I used HA raft storage).
run vault operator init (with the same number of key-shares/key-threshold as the snapshot).
run vault operator unseal, to fully unseal vault using newly generated keys.
run vault login <root-token>, using newly generated token.
run vault operator raft snapshot restore --force mysnapshot.tar.gz
run vault operator unseal, to fully unseal vault using the snapshot’s keys.
run vault login, using any previous method that worked on the old vault.

This worked fine. The key for me was to ensure I did the vault init with a matching key structure.

Topic		Replies	Views
Backup - Restore Vault raft	2	1742	June 23, 2020
Raft snapshot restore issue Vault	6	1613	May 17, 2022
Restoring raft snapshot on test cluster broke production cluster Vault	3	552	September 29, 2022
Raft rejoin problem after recovery mode Vault	5	988	July 10, 2020
How to unseal vault server after raft snapshot restore? Vault	7	1569	June 5, 2022

Recover vault cluster with raft storage

Related topics