High Available Vault setup in Kubernetes

Hello all,

Within our company we are investigating if Hashicorp Vault can help us improve our secret management use cases.

I read the documentation of a high available Vault setup which requires Consul.
Is this also the case if you deploy Vault inside Kubernetes?

To me it looked like Consul is used as an loadbalancing mechanism to distribute traffic to active Vault servers. As Kubernetes services have their own loadbalancing mechanism I was wondering if Consul is then required for an high available Vault setup.

I’m looking forward to someone who can answer my question.

Best regards,
Wouter

That’s probably an old reference. You can use Integrated Storage as your backed (others available) and still do DR. v1.8+ recommended.

However, you’re confusing DR with HA. DR is a “dark” cluster that gets all data from the primary and is ready to be promoted to active. Best practice is to manually do this switch.

HA (and DR is a part of it) within a cluster is dependent on the number of nodes in that cluster, each node can take over as leader during an election, and is automatic. This gives you a highly available cluster. An HA cluster + DR cluster gives you redundancy.

HA activity is included in OSS.
DR requires an Enterprise License for 2 clusters.

Thank you explaining.
Please let me summarize the HA and DR concepts.
If you could confirm (or deny) if I understand the concepts correctly that would be nice.

With HA there is a single primary nodes able to perform read/write actions within the Vault.
If the primary node fails users are still able to get their secrets from any of the secondary nodes. No writes can be performed as long a the primary node is unavailable.

With DR there is still a single primary node able to perform read/write actions.
If the primary node fails, there is a mechanism in place to (automatically) elect a new primary node. This way the user is only unable to write new secrets as long as the primary node is unavailable and a new primary has not been elected yet.

If the primary fails, there will be an election and one of the standby nodes in the same cluster becomes leader, and write operations can continue. This is almost seemless with no downtime – it’s quick, this is why they went with the raft protocol.

DR cluster is setup as “dark”. The cluster does not respond to any queries from users, it simply tries to keep up with the primary cluster’s changes – including leases (that’s the important part and what makes DR useful). At some point if there is a primary cluster failure, the DR cluster can be “promoted” ( manual step) to a primary cluster, becomes active and starts to respond to queries from users. Externally you have to manage the DNS/Load Balancers, routing, etc.

I should mention that the concept of leader/perf.standby does exist in the dark DR cluster, but it isn’t worth thinking about until the cluster is promoted to active, so don’t confuse yourself.

Thank you Aram for explaining this to me.
The concepts are now clear to me.
I didn’t know that DR is talking about a completely seperate clusters.

To me the Vault documentation about HA and DR does not describe the concept good enough as you explained to me. I think it would help if the documentation would be updated to reflect your explaination.

Nevertheless, thank you very much. I have all the information I need!

1 Like