Hi
We encountered an issue which causes significant production downtime since a network error causing the raft replication to failed, even though we are running on 6 nodes. We are looking for an alternative storage backend that would reduce this issue. I stumbled upon PostgreSQL backend and became interested in it.
I’m running a cluster consisted of a bunch of 1.19.x. Here is the result of vault operator members
:
Host Name API Address Cluster Address Active Node Version Upgrade Version Redundancy Zone Last Echo
--------- ----------- --------------- ----------- ------- --------------- --------------- ---------
vault-node6 https://172.16.xx.xx:8200 https://172.16.xx.xx:8201 false 1.19.3 1.19.3 n/a 2025-05-15T14:22:09+07:00
vault-node1 https://172.16.xx.xx:8200 https://172.16.xx.xx:8201 true 1.19.0 1.19.0 n/a n/a
vault-node5 https://172.16.xx.xx:8200 https://172.16.xx.xx:8201 false 1.19.3 1.19.3 n/a 2025-05-15T14:22:08+07:00
vault-node3 https://172.16.xx.xx:8200 https://172.16.xx.xx:8201 false 1.19.3 1.19.3 n/a 2025-05-15T14:22:10+07:00
vault-node4 https://172.31.xx.xx:8200 https://172.31.xx.xx:8201 false 1.19.1 1.19.1 n/a 2025-05-15T14:22:06+07:00
vault-node2 https://172.31.xx.xx:8200 https://172.31.xx.xx:8201 false 1.19.3 1.19.3 n/a 2025-05-15T14:22:06+07:00
I’ve seen some other posts mentioning that PostgreSQL is not recommended since the storage layout is “directories”, therefore the integrated Raft storage should be a much better option.
I’d also like to mention that my Raft database’s size (I suppose the BoltDB database) is around 16 GB.
Questions:
- Is there any migration path that can be done from the Raft Integrated storage to PostgreSQL storage?
- If migrating to PostgreSQL is not recommended and it’s recommended to stay on the Raft integrated storage, how should I avoid (or what should I do to avoid) having cluster downtime due to a network error?
Thank you