we are currently evaluating Hashicorp Vault in a Raft HA Cluster and ran into some selfmade Desaster Recovery. Maybe you can help me out with this.
First of all I followed this Setup Tutorial:
After that I tried to replicate this Setup on 4 seperate VMs. One Transit Node and Vault 2-3.
The Raft Cluster and auto-unseal generally works, but after I ran OS Patches I noticed that the Transit Node would start uninitialized.
This is because the Transit Node was (as in the Tutorial) using inmem Storage, meaning the Transitkey is gone.
I tried to re-initialize the Transit Node and generated a new Tranistkey, but when I restarted vault_2 he wouldn’t accept that new Transitkey.
So currently I’m having and uninitialized Transit Node, a vault_2 who won’t accept the new Transitkey and vault_3/vault_4 still running in the Cluster (probably until I reboot them and they have to auto-unseal).
Do you have any tips for me on how to rebuild the Transit Node and Key after the inmem Data is gone?
Thanks in advance!
Edit: How would I even prevent Data Loss on the Transit Node? I just tried to start it as “raft” instead of “inmem”. Now he needs a configured api and cluster address and enabling the transit SecretsEngine fails with “local node not active but active cluster node not found”
The key is only available during initialization, and when you setup auto unseal, you rotate that key into the transit engine and store it there. If you lose that key your vault cannot be unsealed again. There is no way to recover from this.
Auto unseal is a good option but it’s also very dangerous, if you ever lose access to the cloud KMS, or the transit key, you’re locked out for good.
A) You can use multi-regional KMS cloud keys so that if you lose access to one region you can still get your key from a different region.
B) If you’re planning on using a transit engine, don’t use an in-memory (-dev <<< HENCE THE NAME) as it’s for development and not for actual use.
C) You can BYOK, where you provide the key … this is the least secure option as any number of copies can exist and be shared around but it’s also the safest option.
I’m more or less glad to run into this Issue during Evaluation, so no harm here.
Right now I’m thinking of rebuilding the Cluster and getting rid of the Transit Node if that is a Single Point of Failure.
That would leave me with the old traditional Unseal Keys for every Node.
Okay, so just for my understanding (because I thought this is what the Transit Node is for):
Let’s say I get rid of the Transit Node, because I deem it to unstable as Single Point of Failure for use in Production.
I re-build a Raft HA Cluster and initialize vault_2 with 5 unseal_keys and a Threshold of 3. I put those keys somewhere in a fireproof safe.
I then add vault_3/4 to the Cluster without initializing them? (Just like with the Transit Node)
So I would use those unseal_keys from vault_2 whenever one of the three nodes is being restarted?
Because if that’s the Case I don’t see the need of the Transit Node for our Setup.
You’re confusing way too many different topics and issues. Autounseal has nothing to do with nodes or the number of nodes.
Autounseal contains a copy of the encryption key that Vault needs to access it’s own data. If you don’t have that, then Vault needs someone or something to provide the minimum number of key-shards to be able to piece together the key. It’s dangerous to keep the key with the lock, so Hashicorp have given you multiple choices on where to keep the key … another Vault instance running transit engine or a cloud provider that has a KMS service.
The number of keys is meaningless, the only value that matters to Vault is the minimum number of key-shards required to reassemble the encryption key.