HA active/standby mode switching frequency

I am setting up HA Vault with Consul backend in a private kubernetes cluster.

For the PoC a cluster with 1 master and 3 workers is used. Consul is installed from the official HashiCorp helm chart and for the Vault I have created a custom one.

Everything works fine, however while inspecting Vault pod logs I see that approximately every 6 minutes active node is losing leadership:

[WARN] core: leadership lost, stopping active operation

Since both products Vault and Consul are new to me could you please advise whether that is an expected behaviour?

Thank you

Hi, I provision similliar setup.
2 vault instances, 3 consul instances as backend.

Using latest version of both. I think this can be some networking issue in the network ?
Maybe some HW problems with cables, switches …

You can try change log_level of vault to Debug or Trace, to see why leadership is lost.

1 Like

Good day,
In my experience there can be several issues:

  • Network access as Robert told already
  • ACL: lack of rights

If you have setup when vault access consul agent which has connection with consul servers, you need ports 8300 and 8301 reachable from agent to each server. And each server needs the same access. Also if you need wan join you need 8302 also be reachable. But it mostly for making two data centers connected.

It is possible that before record you showed can be a more interesting record

Maybe this sandbox can be useful for you https://github.com/yura-shutkin/docker-lab/tree/master/consul-acl-vault