In production I run a 5 node Vault cluster in Kubernetes using Vault’s raft integrated storage. Recently I ran into issues where there was leader fluctuation, and out of nowhere I see that one of the 5 nodes has hardly any data in its raft.db and vault.db. I have been trying to understand the implications of that.
In a test cluster, I have recreated this behavior where the original leader has more data in its vault.db and raft.db than the followers. But, what’s confusing is when I step that leader down and a follower with less data becomes leader, I still see all of the data that the original leader had in the Vault UI. Yet, I see very clearly on the node that it has less data.
Questions:
- How can this be that followers have less data than the leader in vault.db yet they can still serve fully accurate data to the Web UI when they become leader? ie- it seems like they would only have a fraction of the data given their vault.db is a fraction of the size of the original leader
- Is the expectation that the leader and all followers have the same amount of data in both vault.db and raft.db?
Thanks for any help you can provide-- this is truly perplexing and I can’t tell if it’s a critical production issue.