Vault load balancing and read\write requests

Hi.

We plan on using Vault Enterprise behind a load balancer which will likely be haproxy. I was thinking of using haproxy to send any PUT requests to the active node and any GET requests to the active node or performance standbys. This will avoid the standby’s having to then redirect writes to the active node.
I’m thinking we would check for a 200/473 status and also if the request is a read or write.
From researching we’d likely need http mode and haproxy will need to decrypt on receiving the traffic and re-encrypt it before sending to the Vault nodes. Does this sound workable?
Thanks

That is workable.
I’d question the effort and added complexity if it is really needed - what are you performance load/requirements on writes?

We don’t have exact requirements on how quick the response should be. I was thinking that if we could hit the active node for writes it would be preferred. Do people normally set haproxy in tcp mode and just look for the 200/473 response to send to any node and then writes are redirected? We’re not tied to using haproxy as a load balancer so can change if another is more suited

Nginx, HA Proxy, F5 - anything that can parse HTTP response codes from sys/health is normal to use.
Depends on the read/write workload balance. If 90% of your requests are reads, you’re not going to get alot of performance benefit to split it up. Its all somewhat hand-wavy if you do not have an idea of workload/thruput requirements.

Thanks. More than 90% of requests will likely be reads

Using a ALB is probably the only way to do vault architecture correctly. The nice thing is that the application layer check is built into the vault health check:

In AWS for a targetGroup you can use: /v1/sys/health?perfstandby=200 against 8200 would tell give you the healthy nodes, and /v1/sys/health against 8201 would give the leader node in two nice groups. 8200 can be used to direct your ALB for your users and 8201 can be used as your cluster for any PR or DR connectivity.

thanks we’re on premise so plan on using haproxy to check for a 200 and 473 status code. I thought about it since and it may be pointless to check for read\write requests since we’ll use performance replication and the write will be directed to the primary cluster anyway. So the 200 or 473 status may be sufficient.

There is no need to check for type of request. All nodes can reply to reads or writes. Internally when a write request is made, the node that got the request will internally tell the leader to store the updated value. There is no need for you to check or validate that and you’ll just end up causing more headache for yourself if you do.

So if it’s a write the node that receives it will reply to the client and forward the request to the leader? And since it’s asynchronous there’s no delay? With performance replication does the leader in a secondary cluster forward the write to the leader in a primary cluster? So the write request may be received by a performance standby in a secondary which replies to the client after forwarding the request to it’s local leader which forwards to the leader in the Primary cluster? Sorry for all the questions

Essentially correct.

Correct.

BTW, even the nodes in the same instance of vault that are not “leader” nodes are considered Performance standbys. You can see this in the health check output of any non-leader node in the primary cluster.

  "performance_standby": true,

Hello, could you share your haproxy conf, please?
I am becoming crazy because my haproxy conf does not work for vault. It returns ssl handshake errors but connecting directly to a vault server it works fine.
Ignazio

This is what I was testing with. I did see intermittent ssl handshake errors when using haproxy

    frontend vault_https
      mode tcp
      log global
      timeout client 30000
      bind *:443 
      description Vault over https
      default_backend vault_https
      use_backend vault_https_backup if { nbsrv(vault_https) lt 3 }
      option tcplog
      log         /dev/log local2 debug


    backend vault_https
      mode tcp
      timeout check 5000
      timeout server 30000
      timeout connect 5000
      option httpchk GET /v1/sys/health
      http-check expect rstatus 200|473|429
      option tcplog
      option httplog
      log         /dev/log local2 debug
      server server1 server1:8200 check port 8200 check-ssl verify none inter 2000  send-proxy fastinter 1000 downinter 10000 fall 2 rise 2
      server server2 server2:8200 check port 8200 check-ssl verify none inter 2000  send-proxy fastinter 1000 downinter 10000 fall 2 rise 2
      server server3 server3:8200 check port 8200 check-ssl verify none inter 2000  send-proxy fastinter 1000 downinter 10000 fall 2 rise 2
      server server4 server4:8200 check port 8200 check-ssl verify none inter 2000  send-proxy fastinter 1000 downinter 10000 fall 2 rise 2
      server server5 server5:8200 check port 8200 check-ssl verify none inter 2000  send-proxy fastinter 1000 downinter 10000 fall 2 rise 2

    backend vault_https_backup
      mode tcp
      timeout check 5000
      timeout server 30000
      timeout connect 5000
      option httpchk GET /v1/sys/health
      http-check expect rstatus 200|473|429
      server server6 server6:8200 check port 8200 check-ssl verify none inter 2000 send-proxy fastinter 1000 downinter 10000 fall 2 rise 2
      server server7 server7:8200 check port 8200 check-ssl verify none inter 2000 send-proxy fastinter 1000 downinter 10000 fall 2 rise 2
      server server8 server8:8200 check port 8200 check-ssl verify none inter 2000 send-proxy fastinter 1000 downinter 10000 fall 2 rise 2
      server server9 server9:8200 check port 8200 check-ssl verify none inter 2000 send-proxy fastinter 1000 downinter 10000 fall 2 rise 2
      server server10 server10:8200 check port 8200 check-ssl verify none inter 2000 send-proxy fastinter 1000 downinter 10000 fall 2 rise 2

1 Like