I’m having periodic (everyday!) problem when Consul instances became unavailable for short period of time ~1-2min and I’m failed to investigate what is the cause. This downtime causes dependent services crash or restart (by Nomad) so it’s kinda big deal for us.
Topology: 12 bare metal machines (with consul agents) + 3 cloud instances for consul servers.
At least once a day connection between consul nodes got lost for short period of time (its not a network issue, I’m sure!). See the logs below.
What can cause this? I’ve attached more details and logs here: