Vault highly available setup

Vault + Consul client mode can be communicate with multiple Consul server, but how could I make Vault highly available to other service?

I am afraid I am still confused, here is why:

New York DC currently running,
2 x Vault (Let’s call it A and B) + Consul client mode
3 x Consul server mode in a cluster

Bare-metal server called Peter connect Vault A;
Bare-metal server called Tom connect to Vault B;

how does Peter connect to Vault B when Vault A failed?

There is no magic here; you need a separate solution to make sure that clients connect to a server that’s up. Usually that means something like a load balancer or service discovery framework that performs health checks.

Since (within a cluster) only one Vault server is active at a time, clients that connect to a standby node are forwarded to the active node either internally or via a redirect. This has implications on implementation of load balancers, this is all described in that documentation above, in fact.

I see, other than load balancer or service discovery model, is it advisable to install Vault and Consul client on each bare-metal server to serve local machine request? or it will introduce load to the bare-metal server itself?

I think that pattern directly contradicts the published advice that, in production environments, Vault should be the only tenant on a host.

understood, I think service discovery is the closest solution if I don’t want to add load balancer

1 Like

First, let’s avoid architecture that deploys 2 Vault nodes in a cluster. You want 3 (or 5), really.

You would put a service discovery/DNS/load balancer in front of the Vault cluster nodes. Then, the ELB (or your LB of choice) would look for each node’s health and keep it (or dispose of) in the ELB target pool by using the /sys/health endpoint.

The LB can round robin, as if a Vault node that is not the active gets a request, it will forward to the active node. To note: In Enterprise, the standby node will also serve read requests giving a pretty big lift in read performance.

If a node dies, the LB takes it out of the targets and Vault will self-elect a new active and keep on running. In a 3 node cluster, you want to replace that node ASAP…

1 Like

all good, thanks for the answer

Isn’t this breaking with one of the core security considerations of Vault that the traffic to and from Vault should be protected and vault should terminate it’s own SSL to avoid any issues arising from insecure/bad handling of SSL?