VAULT_ADDR failover in HA

I setup a HA vault cluster with RAFT backend and there is something I don’t quite understand.

For a client to connect to the cluster, it must uses VAULT_ADDR.

According to the doc, it is better to avoid a LB in frontend of vault and each vault node redirects to the active node, so if VAULT_ADDR does not direct to the active node, this node will redirect the connexion to the active node. Ok, that I understand.

My question is:

What happens when the node that VAULT_ADDR references is down, I mean completely down (the server is broken, vault does not work anymore on it) ? How is the client redirected to the rest of the cluster ?

Is it possible to put multiple addresses in the VAULT_ADDR for the client to switch automatically if the first address is unreachable ?

I’m not sure this is recommended. Where’d you find a LB is to be avoided?

1 Like

Hi,

Only one member of a Vault cluster is “active”, if the active one goes down, another one will be promoted to active. This is an important concept.

When you put an LB in front of it (like nginx), you should all cluster members in a pool, define health checks and only forward requests to the active member. You can use the /sys/health endpoint for this: https://www.vaultproject.io/api-docs/system/health

So, when the active node goes down, a new one will be elected. And the health-checks will change accordingly. At this point, the LB will send requests to the new active node.

If you are not using an LB, you should have some dns-based service discovery tool i place (Like Consul :wink: ). This will redirect a hostname like vault.service.consul to the active node.

4 Likes

Spot on @jeroenjacobs79 excellent explanation.

1 Like

Thank you. So a LB is needed then :slight_smile:

In the doc https://www.vaultproject.io/docs/concepts/ha#behind-load-balancers the last line This can cause a redirect loop and as such is not a recommended setup when it can be avoided. can be misleading.

@micheelengronne I think if read out of context, maybe?
That paragraph starts with if the only access to the Vault servers is via the load balancer , then you need to set the api_addr to the LB, and overall this isn’t recommended.

I don’t think you want to isolate your Vault nodes from each other, though some have req’s for that. If you end up in that case - nodes only accessible thru the LB’s URL, then you’re stuck with this possible loop (temporarily, as the LB updates health).