HA failover time

Using terraform/helm to set up Vault on a GCP Kubernetes cluster, we tested the failover time and were not very excited. Both instances over a minute of downtime, even when the new leader was elected in 5-6 seconds.

Vault 0 is leader
00:09:10am - delete issued vault 0, cluster down
00:09:16am - vault 2 enters leader state
00:09:31am - vault 0 restarted, standby mode
00:09:32-09:50am - vault 0 election timeouts
00:09:49am - vault 2 loses leader state
00:09:51am - vault 2 enters leader state
00:09:51am - vault 0 follower
00:10:45am — up

1m35 downtime

Vault 2 is leader
00:21:13am - delete vault 0
00:21:18am - vault 2 detects failure
00:21:25am - vault 0 restarted and in standby
00:21:49am - vault 0 wins election, becomes leader
00:21:56am - cluster down
00:23:00am - cluster up

1m04 downtime

Anyone run into this before or have any ideas as to what it may be? I don’t like thinking any time a follower pod goes down / rotates, we’re going to have a minute of downtime.

I haven’t had time to measure things as clearly as you have, but I can confirm I have similar concerns about how long it takes for Vault to do leader failover based on informal observations at work.

I’ve also found that the sys/step-down API, intended to trigger a controlled leadership change, sometimes causes longer outages than just restarting the former leader node.

You should clarify which Vault storage backend you are using, as that is the component that mediates the HA, so could affect results (we’re using Consul).

Hi @aram,
Please review the Community Guidelines. We wish to keep the Discuss forum professional and helpful. Sarcasm and snark do not help users resolve their problems, and may in fact keep people from posting - especially when they feel like their question may be considered trivial. We all have room to learn and grow, and we need to be respectful to our fellow practitioners and contributors. Thanks for your understanding.

Please see this changelog entry from both Vault 1.11.0 and Vault 1.10.4:

  • storage/raft: Use larger timeouts at startup to reduce likelihood of inducing elections. [GH-15042]

If you didn’t attempt this with either Vault 1.10.4 or 1.11.0, can you retry it and see if it addresses your issue? Can you share which version of Vault you used for the test?

1 Like

We were trying the 1.10.3 image in the helm charts.
Will try with 1.10.4 or 1.11.0.

Same issue exists on 1.11.0

I set up a test lab running a 3 node Vault cluster with Raft storage on a minikube Kubernetes cluster on a single free Oracle Cloud arm64 VM, and played around deleting various pods.

Although failover times varied significantly from barely a second out to about 7 seconds, I didn’t observe any minute-long downtimes.

That said, I know we have failovers taking way longer than that on a production cluster at work.

There are many variables that might be involved - storage backend, amount of data in vault, request rates, amount of mounts in Vault, for example.

Could you confirm what storage backend you’re using, and also explain more about how you’re measuring “cluster up/down” and “has a leader” ?

Thanks for running a test.

Raft storage.

Measuring “downtime” by issuing a CLI command to vault once a second during the test (simple command, just vault operator raft list-peers). Consider it up as long as I’m getting a valid response, and down when I start getting errors.

As far as “has a leader”, I’m reviewing each of the pod logs.

For instance:

11:39:27 - vault 2 shutdown triggered
11:39:44 - vault 2 back in candidate state
(Rejecting vote request, since we have a leader, keeps restarting election)
11:39:59 - vault 0 (current leader) - storage.raft: peer has newer term, stopping replication
11:39:59 - vault 2 becomes leader

11:39:49 - 11:39:59: 403 status returned
11:40:00 - 11:40:03: 200 returned
11:40:03 - 11:40:30: 502 returned
11:40:31+ 200 returned

I think it has to do with GKE and the GCP load balancer. That said, the issue mentioned above still exists with the “newer term” follower becoming the leader on a pod restart.

I think there’s something rather odd going on with pod to pod communications in your environment.

I restarted standby pods plenty of times and not once did they displace a current leader, for me.

I think you might be experiencing two separate problems:

  • A restarted standby shouldn’t affect the current leader
  • The separate load balancer you seem to have in front of the cluster may be reacting slowly to a leader change, introducing more downtime.

I think the way forward would be to try to verify this, and if so, tackle the two issues separately.

For the first, can you post full Vault logs showing it happen, from both the old and new leader?

For the second, try monitoring availability of the Vault API of each node separately, avoiding the load balancer.

For 1… this is when I killed the vault 2 follower…

Vault 0 leader logs:
2022-06-26T14:02:36.852Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-2.vault-internal:8201 error=EOF

2022-06-26T14:02:36.861Z [INFO] storage.raft: aborting pipeline replication: peer=“{Voter 024b4b77-5ecb-c815-4eae-55f079d21232 vault-2.vault-internal:8201}”

2022-06-26T14:02:36.930Z [ERROR] storage.raft: failed to appendEntries to: peer=“{Voter 024b4b77-5ecb-c815-4eae-55f079d21232 vault-2.vault-internal:8201}” error=“remote error: tls: internal error”

2022-06-26T14:02:36.994Z [ERROR] storage.raft: failed to appendEntries to: peer=“{Voter 024b4b77-5ecb-c815-4eae-55f079d21232 vault-2.vault-internal:8201}” error=“dial tcp 172.31.1.9:8201: connect: connection refused”

2022-06-26T14:02:37.041Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-2.vault-internal:8201 error=“dial tcp 172.31.1.9:8201: connect: connection refused”

2022-06-26T14:02:37.081Z [ERROR] storage.raft: failed to appendEntries to: peer=“{Voter 024b4b77-5ecb-c815-4eae-55f079d21232 vault-2.vault-internal:8201}” error=“dial tcp 172.31.1.9:8201: connect: connection refused”

2022-06-26T14:02:37.361Z [WARN] storage.raft: failed to contact: server-id=024b4b77-5ecb-c815-4eae-55f079d21232 time=500.792289ms

2022-06-26T14:02:37.849Z [WARN] storage.raft: failed to contact: server-id=024b4b77-5ecb-c815-4eae-55f079d21232 time=988.815375ms

2022-06-26T14:02:38.287Z [WARN] storage.raft: failed to contact: server-id=024b4b77-5ecb-c815-4eae-55f079d21232 time=1.426256122s

2022-06-26T14:02:47.193Z [ERROR] storage.raft: failed to appendEntries to: peer=“{Voter 024b4b77-5ecb-c815-4eae-55f079d21232 vault-2.vault-internal:8201}” error=“dial tcp 172.31.1.9:8201: i/o timeout”

2022-06-26T14:02:47.243Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-2.vault-internal:8201 error=“dial tcp 172.31.1.9:8201: i/o timeout”

2022-06-26T14:02:54.273Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:02:55.319Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:02:56.689Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:02:57.300Z [ERROR] storage.raft: failed to appendEntries to: peer=“{Voter 024b4b77-5ecb-c815-4eae-55f079d21232 vault-2.vault-internal:8201}” error=“dial tcp 172.31.1.9:8201: i/o timeout”

2022-06-26T14:02:57.439Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-2.vault-internal:8201 error=“dial tcp 172.31.1.9:8201: i/o timeout”

2022-06-26T14:02:57.987Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:02:59.315Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:03:00.329Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:03:01.631Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:03:02.989Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:03:04.304Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:03:05.805Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:03:07.381Z [WARN] storage.raft: rejecting vote request since we have a leader: from=vault-2.vault-internal:8201 leader=vault-0.vault-internal:8201 leader-id=83fd9baf-bf60-0845-d511-95245d3820cd

2022-06-26T14:03:07.467Z [ERROR] storage.raft: failed to appendEntries to: peer=“{Voter 024b4b77-5ecb-c815-4eae-55f079d21232 vault-2.vault-internal:8201}” error=“dial tcp 172.31.1.9:8201: i/o timeout”

2022-06-26T14:03:07.601Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-2.vault-internal:8201 error=“dial tcp 172.31.1.9:8201: i/o timeout”

2022-06-26T14:03:07.723Z [ERROR] storage.raft: peer has newer term, stopping replication: peer=“{Voter 024b4b77-5ecb-c815-4eae-55f079d21232 vault-2.vault-internal:8201}”

2022-06-26T14:03:07.723Z [INFO] storage.raft: entering follower state: follower=“Node at vault-0.vault-internal:8201 [Follower]” leader-address= leader-id=

2022-06-26T14:03:07.723Z [WARN] core: leadership lost, stopping active operation

2022-06-26T14:03:07.723Z [INFO] core: pre-seal teardown starting

2022-06-26T14:03:07.723Z [INFO] storage.raft: aborting pipeline replication: peer=“{Voter d7dd1295-3f9b-d1e9-404a-52fb0faebd0b vault-1.vault-internal:8201}”

2022-06-26T14:03:08.223Z [INFO] core: stopping raft active node

2022-06-26T14:03:08.224Z [INFO] rollback: stopping rollback manager

2022-06-26T14:03:08.224Z [INFO] core: pre-seal teardown complete

2022-06-26T14:03:08.253Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[“req_fw_sb-act_v1”]

2022-06-26T14:03:08.257Z [ERROR] core: clearing leader advertisement failed: error=“node is not the leader”

2022-06-26T14:03:08.257Z [ERROR] core: unlocking HA lock failed: error=“node is not the leader”

2022-06-26T14:03:08.416Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[“req_fw_sb-act_v1”]

— Vault 2 logs —
==> Vault server started! Log data will stream in below:

2022-06-26T14:02:50.527Z [WARN] unknown or unsupported field setNodeId found in configuration at /tmp/storageconfig.hcl:3:1
2022-06-26T14:02:50.527Z [INFO] proxy environment: http_proxy=“” https_proxy=“” no_proxy=“”
2022-06-26T14:02:50.528Z [WARN] storage.raft.fsm: raft FSM db file has wider permissions than needed: needed=-rw------- existing=-rw-rw----
2022-06-26T14:02:50.702Z [INFO] core: Initializing version history cache for core
2022-06-26T14:02:50.703Z [INFO] core: raft retry join initiated
2022-06-26T14:02:50.703Z [INFO] core: stored unseal keys supported, attempting fetch
2022-06-26T14:02:50.745Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=[::]:8201
2022-06-26T14:02:50.745Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
2022-06-26T14:02:50.746Z [INFO] storage.raft: creating Raft: config=“&raft.Config{ProtocolVersion:3, HeartbeatTimeout:3000000000, ElectionTimeout:3000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:500000000, LocalID:"024b4b77-5ecb-c815-4eae-55f079d21232", NotifyCh:(chan<- bool)(0xc00109e2a0), LogOutput:io.Writer(nil), LogLevel:"DEBUG", Logger:(*hclog.interceptLogger)(0xc0007b7380), NoSnapshotRestoreOnStart:true, skipStartup:false}”
2022-06-26T14:02:50.748Z [INFO] storage.raft: initial configuration: index=41 servers=“[{Suffrage:Voter ID:83fd9baf-bf60-0845-d511-95245d3820cd Address:vault-0.vault-internal:8201} {Suffrage:Voter ID:024b4b77-5ecb-c815-4eae-55f079d21232 Address:vault-2.vault-internal:8201} {Suffrage:Voter ID:d7dd1295-3f9b-d1e9-404a-52fb0faebd0b Address:vault-1.vault-internal:8201}]”
2022-06-26T14:02:50.748Z [INFO] storage.raft: entering follower state: follower=“Node at vault-2.vault-internal:8201 [Follower]” leader-address= leader-id=
2022-06-26T14:02:50.748Z [INFO] core: entering standby mode
2022-06-26T14:02:50.748Z [INFO] core: vault is unsealed
2022-06-26T14:02:50.868Z [INFO] core: unsealed with stored key
2022-06-26T14:02:54.254Z [WARN] storage.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2022-06-26T14:02:54.254Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=382
2022-06-26T14:02:55.316Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:02:55.316Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=383
2022-06-26T14:02:56.686Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:02:56.686Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=384
2022-06-26T14:02:57.983Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:02:57.983Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=385
2022-06-26T14:02:59.312Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:02:59.312Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=386
2022-06-26T14:03:00.326Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:03:00.326Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=387
2022-06-26T14:03:01.629Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:03:01.629Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=388
2022-06-26T14:03:02.986Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:03:02.987Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=389
2022-06-26T14:03:04.301Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:03:04.301Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=390
2022-06-26T14:03:05.802Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:03:05.802Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=391
2022-06-26T14:03:07.379Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:03:07.379Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=392
2022-06-26T14:03:08.527Z [WARN] storage.raft: Election timeout reached, restarting election
2022-06-26T14:03:08.527Z [INFO] storage.raft: entering candidate state: node=“Node at vault-2.vault-internal:8201 [Candidate]” term=393
2022-06-26T14:03:08.535Z [INFO] storage.raft: election won: tally=2
2022-06-26T14:03:08.535Z [INFO] storage.raft: entering leader state: leader=“Node at vault-2.vault-internal:8201 [Leader]”
2022-06-26T14:03:08.535Z [INFO] storage.raft: added peer, starting replication: peer=83fd9baf-bf60-0845-d511-95245d3820cd
2022-06-26T14:03:08.535Z [INFO] storage.raft: added peer, starting replication: peer=d7dd1295-3f9b-d1e9-404a-52fb0faebd0b
2022-06-26T14:03:08.537Z [INFO] storage.raft: pipelining replication: peer=“{Voter 83fd9baf-bf60-0845-d511-95245d3820cd vault-0.vault-internal:8201}”
2022-06-26T14:03:08.539Z [INFO] storage.raft: pipelining replication: peer=“{Voter d7dd1295-3f9b-d1e9-404a-52fb0faebd0b vault-1.vault-internal:8201}”
2022-06-26T14:03:08.544Z [INFO] core: acquired lock, enabling active operation
2022-06-26T14:03:08.600Z [INFO] core: post-unseal setup starting
2022-06-26T14:03:08.609Z [INFO] core: loaded wrapping token key
2022-06-26T14:03:08.610Z [INFO] core: successfully setup plugin catalog: plugin-directory=“”
2022-06-26T14:03:08.611Z [INFO] core: successfully mounted backend: type=system path=sys/
2022-06-26T14:03:08.612Z [INFO] core: successfully mounted backend: type=identity path=identity/
2022-06-26T14:03:08.612Z [INFO] core: successfully mounted backend: type=kv path=kv/
2022-06-26T14:03:08.612Z [INFO] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2022-06-26T14:03:08.614Z [INFO] core: successfully enabled credential backend: type=token path=token/ namespace=“ID: root. Path: "
2022-06-26T14:03:08.615Z [INFO] rollback: starting rollback manager
2022-06-26T14:03:08.615Z [INFO] core: restoring leases
2022-06-26T14:03:08.616Z [INFO] expiration: lease restore complete
2022-06-26T14:03:08.619Z [INFO] identity: entities restored
2022-06-26T14:03:08.619Z [INFO] identity: groups restored
2022-06-26T14:03:08.619Z [INFO] core: starting raft active node
2022-06-26T14:03:08.619Z [INFO] storage.raft: starting autopilot: config=”&{false 0 10s 24h0m0s 1000 0 10s false redundancy_zone upgrade_version}" reconcile_interval=0s
2022-06-26T14:03:08.662Z [INFO] core: usage gauge collection is disabled
2022-06-26T14:03:08.670Z [INFO] core: post-unseal setup complete

The thing which really stands out to me is vault-0 repeatedly failing to talk to vault-2, whilst at the same it is rejecting vote requests coming from vault-2.

And meanwhile, vault-2 repeatedly starting elections and getting no answer, for some time, before it finally works.

It seems like when a new pod is spawned in this environment, it can’t (fully?) talk to other pods for a while.

Given this form of network disruption, I believe the Raft algorithm actually is expected to behave in the way it does. (Once an election has been started, it has to go forward, the algorithm doesn’t have a way for an existing consensus of nodes to tell a single node “wrongly” triggering an election that they’re going to leave leadership where it currently is.)

A small aside:

You should fix this erroneous configuration - not because it would have any relevance to the problem, just so it doesn’t mislead people reading the config.