Hi, there are some guidelines on how to do upgrades, they should be applicable.
also
The most important thing to note is to stop first the standby servers, otherwise one of them will be elected and start being the active server.
I am not sure you will be able to have zero downtime. I would schedule a maintenance window, or create a staging environment and test the procedure there is its truly mission critical.
I do have 4 clusters so I’ll be testing on the least important first. What I’m not sure about is node 1. The instruction say:
Properly shut down the remaining (active) node. Note: it is very important that you shut the node down properly. This causes the HA lock to be released, allowing a standby node to take over with a very short delay.
If some of the nodes have the lock in zookeeper, but others have the lock in consul, won’t that cause a split brian?
The only way I see around this is I need to fully shutdown all 3 nodes, then swap the HA config, then start the cluster back up. Looking for a more elegant solution.