i have a consul setup in aws in 3 zones. A last year we had a problem in a zone, it was totally unstable and the services were mostly down, but sometimes were flapping. This cause even more problems as some serves were either working fine or failing due to the failed zone.
So now we want to setup a way to quickly disable a zone, if needed, and only enable it when we are sure it is working fine.
So what is the best way to do this in consul? let’s assume we can’t access the failed zones instances.
we have one master in each zone, so to avoid services from getting registered and pass the healtcheck, i’m thinking that changing the security group to block access from hosts in the failed zone to reach the other zones is probably the way to go, the standalone master should not be able to talk to other masters and the consul-agents will not be reachable
should this be enough, is there other way?
Thanks in advance