Frequent active/standby switching is not in line with expectations

MySQL parameter configuration values:

interactive_timeout: 7200 (2h)

wait_timeout: 86400 (24h)

Phenomenon: Master-slave failover occurs every 20-40 minutes, which is not as expected.

System situation: Cluster deployment, 3 nodes

As expected, shouldn’t it failover occur every 24 hours? Why is it failing so frequently?

Configuration file: max_connection_lifetime = “7200”

It should also failover every 2 hours as expected, which is also not as expected.

What is the situation here, and what is the problem?

Lots to unpack here - do you have metrics on node health? Even basics like CPU utilization, memory, paging to disk, etc could cause nodes in the cluster to think the primary node is not available. Similarly, you mentioned MySQL - is that your Vault backend? If so is it local, in another subnet, region, etc? Would also need to understand metrics there. Finally, any workloads that access Vault possibly causing this at the times you’re outlining?