Over the past month or so, the above error has been occuring very frequently. I run a cluster with 5 servers; I’ve checked disk space (>500Gb available), and the logs. The logs show nothing very untoward, just that there appears to be a network issue from time to time. However, once that network issue clears, the cluster is left in a state where I can see 5 servers in the raft list, but none of them is a leader. It also seems like no leader election is taking place.
The only way to restore things to normal is to restart every server, at which point leader election kicks off and things go back to normal.
This seems… not the desired way things are supposed to work. I expect the leader election to take place if one or more nodes had network issues that got resolved. Or am I mistaken here?