This is excellent, thank you!
We’ve already scaled our Consul cluster up to 5 to reduce the probability of this occurring again.
We are using the stock helm chart GitHub - hashicorp/consul-k8s: First-class support for Consul Service Mesh on Kubernetes with only a couple of tweaks:
{
"performance": {
"raft_multiplier": 1
},
"disable_update_check": true,
"telemetry": {
"prometheus_retention_time": "20s"
}
}
I find it quite surprising that Consul + autopilot in it’s default configuration (as per the helm chart) would be at risk of this sort of issue?
Does the situation improve in future versions of Consul?