We’ve setup a HA Vault setup with two instances (and a HA consul backend).
While the communication between the vault instances and their backends seems fine (if one goes offline, the other one becomes active and has all secrets available), I am wondering how to best manage the auth methods setup in the different instances.
AFAICS only the KV store is synced via consule and all policy/auth method setup is unique to each vault instance.
This means that each change to a policy/auth setup needs to be applied manually in all vault instances.
Given that only one instance is active at a time, activating and deactivating instances for a small change is tedious.
Everything is stored in the configured backend, so failure of one node should be pretty seamless (you might lose things which are not yet committed to Consul. Presumably you have a separate Consul cluster as you have only 2 Vault servers, so you don’t have a Consul failure issue). You can get a small amount of downtime too depending on which method you use to point traffic at the active node.
We have two vault instances which are connected via a consul-backend cluster (3 servers, 2 agents).
If one vault instance goes down, the KV store is available on the other instance (which is fine).
However, policies and access methods from instance1 are not available on instance2.
Do I understand you correctly that policies and access methods should also be synced between the instances (and their consol backends?).
Apologies if the lack of understanding is on my side.
There is no syncing between instances of Vault itself - they never communicate directly (with the exception of optionally having standbys talking to the primary if you send a query to one instead of the primary).
All configuration, secrets (both KV and dynamic secrets such as database or PKI), leases and tokens are stored in the backend, which in your case is Consul (but could be MySQL, DynamoDB, etc.).
Are you saying that for some reason you aren’t seeing that with your setup?
How do you have things configured? You have 3 separate Consul servers correctly clustered and then each Vault server runs an agent (which the Vault server talks to via localhost)?
How are you unsealing your two Vault instances?
Could you post the configuration for the two Vault instances?
Sounds like something is not configured correctly. Two nodes on the same cluster would have identical information, one would be ‘read-only’ and the other ‘active’. [[ Note that Hashicorp does recommend at least 3 nodes in a HA setup ]]. The read-only node will become ‘active’ if the active node is not responding in that environment. Please post your consul and vault configurations.
Check that the consul servers are all linked and ready, run this on any node:
consul operator raft list-peers
Check the status of the vault servers, run this one each node:
vault status | grep “HA mode”
sorry for the late response. Just wanting to drop that we’re still on trying to get this setup running and I’ll report when I revisit it next time. There was/is lots of other stuff to do Thanks for your input. appreciated!