Concerning Inconsistency In Raft and Vault Storage Among All Nodes

Mr-Howard-Roark · November 23, 2020, 11:55pm

In production I run a 5 node Vault cluster in Kubernetes using Vault’s raft integrated storage. Recently I ran into issues where there was leader fluctuation, and out of nowhere I see that one of the 5 nodes has hardly any data in its raft.db and vault.db. I have been trying to understand the implications of that.

In a test cluster, I have recreated this behavior where the original leader has more data in its vault.db and raft.db than the followers. But, what’s confusing is when I step that leader down and a follower with less data becomes leader, I still see all of the data that the original leader had in the Vault UI. Yet, I see very clearly on the node that it has less data.

Questions:

How can this be that followers have less data than the leader in vault.db yet they can still serve fully accurate data to the Web UI when they become leader? ie- it seems like they would only have a fraction of the data given their vault.db is a fraction of the size of the original leader
Is the expectation that the leader and all followers have the same amount of data in both vault.db and raft.db?

Thanks for any help you can provide-- this is truly perplexing and I can’t tell if it’s a critical production issue.

jlj7 · November 24, 2020, 11:27am

Hi! I saw you posted this on Gitter too. Interesting question. Hashi support can probably provide a more thorough answer, but I wonder whether what you’re seeing is just Raft’s efforts to efficiently use storage space:

Obviously, it would be undesirable to allow a replicated log to grow in an unbounded fashion. Raft provides a mechanism by which the current state is snapshotted and the log is compacted.

From: Integrated Storage | Vault | HashiCorp Developer

I don’t pretend to understand all the workings of Raft, though!

ncabatoff · November 24, 2020, 1:16pm

jlj7 might be right, but at a guess, what you’re seeing is due to the fact that the raft data is stored in BoltDB data files, which are prone to containing a lot of “garbage”. That is, a 100MB bolt file might contain only 10MB of active data, and 90MB of unused data that may be overwritten in future. It doesn’t aggressively free up that space because it would require expensive disk operations - better to waste a bit of disk space than provide slow performance.

Topic		Replies	Views
Restoring raft snapshot on test cluster broke production cluster Vault	3	558	September 29, 2022
Understand memory usage of a cluster running integrated storage Vault k8s , raft	4	432	August 2, 2023
[1.15.4] Raft leader election unexpected behavior Vault	0	188	January 23, 2024
Vault cluster initialization with integrated raft storage failing Vault raft	2	1076	October 3, 2021
Issue on a leader in ha cluster Vault Vault	1	733	August 2, 2023

Concerning Inconsistency In Raft and Vault Storage Among All Nodes

Related topics