We are using the community edition of HashiCorp Vault and currently have a high availability cluster of three Vault servers with integrated storage. According to the official documentation, the best practice for upgrading the servers is to first upgrade the standby nodes, have an upgraded standby node take over as active, and finally upgrading the active node.
Question: Is it possible to perform the upgrades without needing to unseal each the Vault node again? We need to assemble staff with their key-shares in order to proceed with the unsealing, which is inconvenient considering how new upgrades are released around every 10-15 days.
There are many exceptions, if you’re in kub or dynamic IP env then no. If you’re running individual instances or static IP VMs, then I have done it but it seems to be not the norm. Also you shouldn’t do this if you’re jumping major versions (ex: 1.7 → 1.11, etc). So in general yes, it can be done.
An option which is “NOT best-practice” is to gather the keys during work hours, do your work, rotate your keys and send out new ones, so you don’t need to keep the key holder on while you’re upgrading.
A couple of suggestions/best practices
Triple check and triple backup your data before upgrades – I usually force a normal backup, then backup to the local host and then do another one and move it to my machine.
Cloud based auto-unseal is a good option for most people. It’s also practically free since both GCP and AWS provide a free tier that this would fall into. All other operations still require
For a raft cluster, you may want to increase your node count to 5 rather than 3. With only three nodes, If one node fails, there is a small chance that your cluster could panic and seal itself.
Unless you’re doing development against Vault itself, you probably don’t want to be on the bleeding edge of releases. Most production instances follow a n-1 (1.10) or wait 90 days between releases cycle.
We are running the Vault instances on three separate VPS’s with static IP addresses. Would you be able to provide guidance or a link to documentation on how you were able to perform minor upgrades (ie. 1.10.1 to 1.10.2) without needing to restart and unseal the node again manually?
A follow up on this was made in another forum post as our question is not completely related to original post.
No, it’s not possible to upgrade without the need to unseal.
(I’m saying this to you, rather than directly responding to @aram, because he has blocked me on these forums, seemingly because I’ve corrected things he’s told people too often - and here I am needing to do it again…)
I’m sceptical about this claim too - I’d ask for clarification but I can’t because he’s blocked me:
Anyway, back to the topic of unsealing…
Upgrading requires restarting Vault, running a new version of the program code.
The keys are only held in memory, and so are inevitably lost when you restart the process.
Vault doesn’t provide a way to pass the master keys from one active node within a cluster during a rolling restart, as some cryptosystems do.
So, what can you do?
Well, as @aram did point out, auto-unseal is a thing. That’s when your Vault nodes are configured to send an encrypted stored key, over the network, to some remote key management service to have it decrypted, instead of requiring user input.
The remote key management service can either be a cloud provider’s key management system - in which case, your Vault needs credentials to authenticate to your cloud provider’s API - or another instance of Vault - in which case your Vault needs credentials to authenticate to the other Vault (bit of a chicken-and-egg situation here).
There are two big big big caveats you need to bear in mind with auto-unseal:
If your auto-unseal source - be it a key in a cloud provider or your own second Vault - becomes lost, your Vault data is GONE. No recovery. Not even with recovery key shards, which are dangerously misnamed.
If your Vault is set up to auto-unseal, you’ve sacrificed a layer of protection against rogue sysadmins. A rogue sysadmin can now potentially substitute a modified Vault binary that gives them access they should not have, and restart Vault to use it. So, you need extra controls around admin activity on your Vault nodes as an additional countermeasure.
An alternative to auto-unseal, is to build your own (very secure, only ever administered under dual control) system which can be invoked to send unseal keys to the Vault API when an operator requests it (subject to transactional MFA and after-the-fact approval of audit logs). Clearly this is no small amount of work.
Essentially, everything depends on just how secure you need your Vault to be, and just how much you’re willing to trust your senior sysadmins - whether it’s worth it, in your particular scenario, to put in all the work to ensure one person acting alone never has the access to compromise the system - or not.