Failing leader election. No leader

poleary · September 6, 2022, 2:22pm

Hello, I find my 3 pod cluster in a state that I cannot seem to recover from.

I seem to be stuck in a leader election that is not resulting in choosing a leader. This may have been me shooting myself in the foot.

Brief history:

3 pod cluster running on raft

I found one pod had sealed the vault so I manually went to unseal with keys. This for some reason failed. Normally not a problem to manually unseal.

Leader was online at this time and I could browse the vault via UI/CLI.

I extended the replicaset to 5 to bring on more followers and help with the load balancing. These 2 new pods joined the cluster just fine, but would not unseal. I then reduced the replicaset back to original 3 pods. However I failed to remove the 2 new pods from the vault cluster (overlooked it altogether).

I then did a step-down of the current leader (maybe mistake) to force a re-election. Now the election process is failing partially due to not being able to reach pod 1 (original cluster) ,4,5 (from increased replicaset).

I find myself unable to connect via the UI as expected or via the CLI.

Does anyone have suggestions on a method to recover a new leader. At this time I can live with a rebuild if needed, but would prefer to recover if possible.

Some notes below on the existing cluster state and log msgs:

/ $ vault status
Key Value

Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.6.2
Storage Type raft
Cluster Name vault-cluster-0192365f
Cluster ID b633d725-e7c2-adbb-422e-7440b52b2f6c
HA Enabled true
HA Cluster https://vault-001-2.vault-internal-001:8201
HA Mode standby
Active Node Address https://10.xx.4.101:8200
Raft Committed Index 813014
Raft Applied Index 813012

$ kk get pods | grep vault
vault-001-0 2/3 Running 0 12d
vault-001-1 3/3 Running 0 90d
vault-001-2 3/3 Running 0 90d
vault-001-3 2/3 Running 0 12d
vault-001-4 2/3 Running 0 12d

All vault pod STATUS’

[root@~]# kubectl exec -it vault-001-0 – vault status
Defaulting container name to vault.
Use ‘kubectl describe pod/vault-001-0 -n saas-platform-a1’ to see all of the containers in this pod.
Key Value

Seal Type shamir
Initialized true
Sealed true
Total Shares 5
Threshold 3
Unseal Progress 0/3
Unseal Nonce n/a
Version 1.6.2
Storage Type raft
HA Enabled true
command terminated with exit code 2
[root@~]# kubectl exec -it vault-001-1 – vault status
Defaulting container name to vault.
Use ‘kubectl describe pod/vault-001-1 -n saas-platform-a1’ to see all of the containers in this pod.
Key Value

Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.6.2
Storage Type raft
Cluster Name vault-cluster-0192365f
Cluster ID b633d725-e7c2-adbb-422e-7440b52b2f6c
HA Enabled true
HA Cluster https://vault-001-2.vault-internal-001:8201
HA Mode standby
Active Node Address https://10.37.4.101:8200
Raft Committed Index 813014
Raft Applied Index 813012
[root@~]# kubectl exec -it vault-001-2 – vault status
Defaulting container name to vault.
Use ‘kubectl describe pod/vault-001-2 -n saas-platform-a1’ to see all of the containers in this pod.
Key Value

Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.6.2
Storage Type raft
Cluster Name vault-cluster-0192365f
Cluster ID b633d725-e7c2-adbb-422e-7440b52b2f6c
HA Enabled true
HA Cluster https://vault-001-2.vault-internal-001:8201
HA Mode standby
Active Node Address https://10.37.4.101:8200
Raft Committed Index 813014
Raft Applied Index 813012
[root@os01c1kub0114v ~]# kubectl exec -it vault-001-3 – vault status
Defaulting container name to vault.
Use ‘kubectl describe pod/vault-001-3 -n saas-platform-a1’ to see all of the containers in this pod.
Key Value

Seal Type shamir
Initialized false
Sealed true
Total Shares 0
Threshold 0
Unseal Progress 0/0
Unseal Nonce n/a
Version 1.6.2
Storage Type raft
HA Enabled true
command terminated with exit code 2
[root@~]# kubectl exec -it vault-001-4 – vault status
Defaulting container name to vault.
Use ‘kubectl describe pod/vault-001-4 -n saas-platform-a1’ to see all of the containers in this pod.
Key Value

Seal Type shamir
Initialized false
Sealed true
Total Shares 0
Threshold 0
Unseal Progress 0/0
Unseal Nonce n/a
Version 1.6.2
Storage Type raft
HA Enabled true
command terminated with exit code 2

LOGS from old leader repeat this pattern

2022-08-25T00:56:54.239Z [INFO] storage.raft: entering candidate state: node=“Node at vault-001-2.vault-internal-001:8201 [Candidate]” term=14658
2022-08-25T00:56:54.242Z [ERROR] storage.raft: failed to make requestVote RPC: target=“{Voter faa8e3ff-3058-5f12-be3a-4e1e07798aab vault-001-0.vault-internal-001:8201}” error=“dial tcp 10.37.5.91:8201: connect: connection refused”
2022-08-25T00:56:54.244Z [ERROR] storage.raft: failed to make requestVote RPC: target=“{Voter 5df9b941-d066-8d2e-8b13-853a1d15ada3 vault-001-3.vault-internal-001:8201}” error=“dial tcp 10.37.3.29:8201: connect: connection refused”
2022-08-25T00:57:02.354Z [WARN] storage.raft: Election timeout reached, restarting election

I have attempted to drop the missing followers but not able to get a CLI connection to work.

Any suggestions are welcome and appreciated at this time.

jeffsanicola · September 12, 2022, 1:54pm

I’m not sure it’s related, but when we were attempting to migrate to RAFT storage we ran into a fair number of issues with leader election (I think we were on Vault 1.8.x at the time). There have been several bugfixes and improvements as it pertains to RAFT leader election since then and I would suggest, if possible, updating to Vault 1.11.x to see if that helps.

Topic		Replies	Views
Vault leader unable to join raft cluster Vault k8s , vault	1	483	April 15, 2023
Vault operator step down unable to elect a leader Vault k8s , raft , vault	0	19	April 24, 2025
Raft (Integrated Storage) follower joined as non voters after restore Vault k8s , helm , vault	3	369	January 14, 2024
Issue on a leader in ha cluster Vault Vault	1	720	August 2, 2023
Raft snapshot restore issue Vault	6	1567	May 17, 2022

Failing leader election. No leader

Related topics