We are using a HA cluster of HashiCorp Vault with Integrated Storage. While the HA cluster is able to detect when the Active node is down and automatically promote a Standby node to be the new active node, various sources seem to suggest that a load balancing service is required to handle failover. An example of a thoroughly documented solution is to use HAProxy, but the documentation is for Consul, not Integrated Storage.
We have configured HAProxy to use the API endpoint at /v1/sys/health to determine which node is the Active node and redirect the requests to the IP address of the current Active node. Testing the connection between a Percona server and the Vault cluster through HAProxy reveals that failover functions quite well when active nodes are “stepped down” or shut down completely. Here is the following configuration for HAProxy for anyone that is curious:
global
defaults
mode tcp
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend percona
mode tcp
bind <ip_address_of_ha_proxy_server:80>
bind <ip_address_of_ha_proxy_server:443> ssl cert /path/to/cert
redirect scheme https code 301 if !{ ssl_fc }
log global
option tcplog
backend vault
mode tcp
timeout check 5000
timeout server 30000
timeout connect 5000
option httpchk GET /v1/sys/health
http-check expect status 200
server node1 <ip_address_of_vault_server_1> check ssl check-ssl verify none
server node2 <ip_address_of_vault_server_2> check ssl check-ssl verify none
server node3 <ip_address_of_vault_server_3> check ssl check-ssl verify none
Question: Since there seems to be very little documentation for using HAProxy with Integrated Storage to handle failover, we were wondering if this approach is best practice with regards to Vault failover? If there is something simpler or more reliable, would anyone be able to provide a link to the documentation?