Load balancing and application availability

Hi Guys. I’m just about to test Vault Enterprise. When the Vault active node receives requests will it distribute the requests among the standby nodes? If not we were thinking of putting a load balancer such as haproxy in front of Vault forwarding traffic for status code 200 and 473. But we’d have a concern there of haproxy becoming a bottleneck. Do you have any recommendations on a setup for high availability which does not impact performance?
Also for application availability should the applications be configured to failover to use another Vault cluster if their local Vault cluster is unavailable? Or should this be transparent to the applications where it’s handled via a load balancer? Any recommendation on how this is achieved?

With Vault Enterprise you can use performance standby nodes (Performance Standby Nodes - Vault Enterprise | Vault by HashiCorp) which does allow standby nodes to answer read-only requests. If they receive something that would require write access they will forward it onto the active node.

What is your concern with regards to the load balancer reducing performance? You would want to ensure the resources (memory, CPU, number of instances, etc.) are correctly specified to ensure sufficient HA and response times. But assuming that is the case there shouldn’t be any serious impact to the system. One alternative to a load balancer is to use Consul. With this the applicaiton round-robins between the different working servers without needing a connection to a separate server - you can even handle the difference between read-only and read/write requests to minimise redirects at the application level.

With regards to total cluster failure again it depends what your concerns are and what you are trying to guard against. Are you trying to deal with a DR type failover scenario in case a region fails? There are a few cluster replication options (Replication - Vault Enterprise | Vault by HashiCorp), some of which can be seamless to the application. However it might just be that you need additional nodes in your cluster to reduce the liklihood of a total cluster failure, or you want performance replication to allow other active clusters to answer requests from different geographic regions.

we will be using Performance standby nodes. When I looked at them first I thought of using keepalived on the cluster without a load balancer with a floating IP for the Active node. But from further reading I understand that the read requests won’t be distributed if I do that and the active node would handle all requests? Is that correct? That’s definitely not what I want so am looking at each cluster behind load balancers.
My concern with the load balancer is that it will then be in the path between client and server so all traffic has to go through it.
We don’t have Consul currently. I did read that about the nodes proving their status to consul so it can differentiate between read and write. It sounds like it would be useful.
Yep I’m trying to deal with cluster and region failures. We will be using the performance replication feature but am trying to imagine what that looks like for the applications and how failover takes place either via load balancer and\or application.

The cluster performance replication isn’t really designed for DR - the idea is tha both clusters are fully active and queried by “local” users. So in that case you might have two load balancers (one per region/cluster) with applications configured to use the correct local load balancer.

Remember that with performance replication you are to some extent ending up with one cross-region cluster, so some failure scenarios won’t be handled (for example a resource exhaustion issue would probably affect both clusters and cause everything to fail as both sides keep track of all leases, etc.)

If a client hits the active node will the active node distribute the requests if it’s a read request to another node? I guess not so I need to distribute the requests via the load balancer or another method?

The active node can answer both read-only & read/write requests, so no redirects are needed. If a read-only request hits a standby it will answer directly, while if it is a read-write it would either issue a redirect to the active node or make a request itself and proxy the answer - You’d generally use the second option if you are behind a load balancer.

To get the requests to the different servers is up to you. You could just randomly pick a server to use, with some redirects/proxying being needed, or your application could choose different endpoints depending on the type of request, pointing at different DNS names/load balancer VIPs.