A previous post I made led me to asking this question; we have a pretty “old” Vault cluster that used to use Consul as it’s storage backend, but now uses the integrated storage. The vault.db file currently measures in at a tasty 50Gb - which unfortunately can’t be compacted much more than that (I blame approles and edge devices). We’ve had some issues recently that have led to needing to cycle the nodes off cloud instances to bare metal (add bare metal nodes to cluster, decommission cloud nodes after).
So. Bare metal is delivered, vault is installed, configured, and pointed at the cluster leader with a vault operator raft join
- so far, so good. Unseal keys are provided, and it now shows up in the peer list.
That was over 14 hours ago; it still has not caught up - it’s consistently about 100.000 entries off on the last index (when checked with vault operator raft autopilot state
); it’s downloading from the leader in chunks of about 30k index entries, but at our current rate of usage (we’re approaching peak time mid-week) it won’t be likely to catch up. I still have 2 more nodes to add…
Question is: if I take a snapshot, I then restore the snapshot on the node in question (or a new blank node right after joining and unsealing), would that work to speed up matters a little? I find the documentation on snapshots a little vague in the sense that it doesn’t explain whether it tries to apply the snapshot to the entire cluster or just the node you restore to?
Second question is: if snapshots won’t do, what would happen if I copy the vault.db file from the leader? I assume (maybe wrongly so) that since the unseal keys are the same, that it would “just work”. Of course, I know it probably won’t but I figure it can’t hurt asking.
On that note it would be incredibly nice if somehow we can set some options to tune the speed at which a new node downloads the raft db from the leader. At the moment I frankly don’t care if it affects performance, I want that new node up in a hurry. If this was a proper recovery from a downed node, I’d be up that fabled creek with half a paddle because it’s… taking too long.