Hi everyone, I have a small dilemma in terms of implementation of auto-unseal mechanism with transit secrets engine.
I am doing a POC in our organization with 2 Vault clusters (each cluster is running on 4 VM nodes on RHEL) where they are configured with Transit secrets engine that enables them to auto-unseal each other.
All good but there are cases when we have maintenance on weekends and servers can be rebooted after patching.
Now we can potentially end up in a scenario where all servers that are hosting 2 Vault clusters can reboot almost simultaneously which will end up with Vault being sealed in both of them with no means to unseal anymore. From what I understood from documentation, the Recovery Keys won’t be enough to unseal one of the Vault clusters or even 1 node from cluster ? So how to solve this kind of scenario ?
I know there are other ways to auto-unseal, like AWS KMS but in our organization we are restricted on using it so that’s not an option for us.
Any ideas ? Sorry if this is a dumb question I am just learning the Vault and I can’t seem to find the answer what is the best practice
Thanks !
Hi @michael4 ,
Not a dumb question at all. Given one of the constraints you mentioned of not being able to use a KMS (AWS, Azure, GCP), I would consider one of a couple options. An “ops” cluster with shamir unseal so the clusters do not rely on each other to unseal.
In the POC you explained, I could certainly see a scenario where the the clusters are lost because of some scenario - failure of underlying hardware/CSP, patching, general OS crash, etc. You would still need a well-defined procedure for patching to ensure availability, but so long as you have good processes in place to recover that ops cluster, you’ll be able to bring it back up/restore it to service the other clusters with auto-unseal.
If the ops cluster is not an option, I think you would be better served using shamir for unsealing instead of auto-unseal, or using HCP Vault so that burden is removed.
If you can move your org to consider a KMS/HSM you can set up multiple auto-unseal options with Vault enterprise:
Thanks Jonathan,
So I was thinking of the following setup
1 Vault cluster with 5 nodes with auto-unseal transit setup
1 dedicated vault node that will be initiated separately and won’t contain any secrets from the cluster but it will be used only to auto unseal the cluster, the single node itself will use shamir unseal type so in case of patching or reboot etc of cluster nodes the single node can be manually unsealed and cluster will already auto unseal once single node is available.
Not sure perhaps this is not the best practice but due to constraints with using AWS KMS it seems to be only reasonable solution.
Seems it - i would just make sure that transit node/cluster (what I was referring to as the ops cluster) is 100% restorable - really test your backup and restore procedures as that would be a huge single point of failure, would have me bordering on not using auto-unseal, which presents an entire different set of operational challenges.