How to change HA backend with 0 downtime?

We are currently using ‘zookeeper’ for HA on vault clusters. I want to change the HA to ‘consul’.

There is no documentation that I can find on how to handle changing a HA backend. Has anyone done it successfully?

I have a 3 node test cluster. My plan is to do the following

  • Stop vault on node 2 & 3
  • Stop zookeeper on node 2 & 3
  • Change configs to consul on nodes 2 & 3
  • Change configs to consul on node 1
  • Restart node 1 <------ This is the step I’m most worried about
  • Vault unseal
  • Start vault on node 2 & 3
  • Vault unseal
  • Vault operator step down (to ensure HA works).

Hi, there are some guidelines on how to do upgrades, they should be applicable.


also

The most important thing to note is to stop first the standby servers, otherwise one of them will be elected and start being the active server.

I am not sure you will be able to have zero downtime. I would schedule a maintenance window, or create a staging environment and test the procedure there is its truly mission critical.

Thanks for the links.

I do have 4 clusters so I’ll be testing on the least important first. What I’m not sure about is node 1. The instruction say:

Properly shut down the remaining (active) node. Note: it is very important that you shut the node down properly. This causes the HA lock to be released, allowing a standby node to take over with a very short delay.

If some of the nodes have the lock in zookeeper, but others have the lock in consul, won’t that cause a split brian?

The only way I see around this is I need to fully shutdown all 3 nodes, then swap the HA config, then start the cluster back up. Looking for a more elegant solution.