Hello, I find my 3 pod cluster in a state that I cannot seem to recover from.
I seem to be stuck in a leader election that is not resulting in choosing a leader. This may have been me shooting myself in the foot.
Brief history:
3 pod cluster running on raft
I found one pod had sealed the vault so I manually went to unseal with keys. This for some reason failed. Normally not a problem to manually unseal.
Leader was online at this time and I could browse the vault via UI/CLI.
I extended the replicaset to 5 to bring on more followers and help with the load balancing. These 2 new pods joined the cluster just fine, but would not unseal. I then reduced the replicaset back to original 3 pods. However I failed to remove the 2 new pods from the vault cluster (overlooked it altogether).
I then did a step-down of the current leader (maybe mistake) to force a re-election. Now the election process is failing partially due to not being able to reach pod 1 (original cluster) ,4,5 (from increased replicaset).
I find myself unable to connect via the UI as expected or via the CLI.
Does anyone have suggestions on a method to recover a new leader. At this time I can live with a rebuild if needed, but would prefer to recover if possible.
Some notes below on the existing cluster state and log msgs:
/ $ vault status
Key Value
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.6.2
Storage Type raft
Cluster Name vault-cluster-0192365f
Cluster ID b633d725-e7c2-adbb-422e-7440b52b2f6c
HA Enabled true
HA Cluster https://vault-001-2.vault-internal-001:8201
HA Mode standby
Active Node Address https://10.xx.4.101:8200
Raft Committed Index 813014
Raft Applied Index 813012
$ kk get pods | grep vault
vault-001-0 2/3 Running 0 12d
vault-001-1 3/3 Running 0 90d
vault-001-2 3/3 Running 0 90d
vault-001-3 2/3 Running 0 12d
vault-001-4 2/3 Running 0 12d
All vault pod STATUS’
[root@~]# kubectl exec -it vault-001-0 – vault status
Defaulting container name to vault.
Use ‘kubectl describe pod/vault-001-0 -n saas-platform-a1’ to see all of the containers in this pod.
Key Value
Seal Type shamir
Initialized true
Sealed true
Total Shares 5
Threshold 3
Unseal Progress 0/3
Unseal Nonce n/a
Version 1.6.2
Storage Type raft
HA Enabled true
command terminated with exit code 2
[root@~]# kubectl exec -it vault-001-1 – vault status
Defaulting container name to vault.
Use ‘kubectl describe pod/vault-001-1 -n saas-platform-a1’ to see all of the containers in this pod.
Key Value
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.6.2
Storage Type raft
Cluster Name vault-cluster-0192365f
Cluster ID b633d725-e7c2-adbb-422e-7440b52b2f6c
HA Enabled true
HA Cluster https://vault-001-2.vault-internal-001:8201
HA Mode standby
Active Node Address https://10.37.4.101:8200
Raft Committed Index 813014
Raft Applied Index 813012
[root@~]# kubectl exec -it vault-001-2 – vault status
Defaulting container name to vault.
Use ‘kubectl describe pod/vault-001-2 -n saas-platform-a1’ to see all of the containers in this pod.
Key Value
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.6.2
Storage Type raft
Cluster Name vault-cluster-0192365f
Cluster ID b633d725-e7c2-adbb-422e-7440b52b2f6c
HA Enabled true
HA Cluster https://vault-001-2.vault-internal-001:8201
HA Mode standby
Active Node Address https://10.37.4.101:8200
Raft Committed Index 813014
Raft Applied Index 813012
[root@os01c1kub0114v ~]# kubectl exec -it vault-001-3 – vault status
Defaulting container name to vault.
Use ‘kubectl describe pod/vault-001-3 -n saas-platform-a1’ to see all of the containers in this pod.
Key Value
Seal Type shamir
Initialized false
Sealed true
Total Shares 0
Threshold 0
Unseal Progress 0/0
Unseal Nonce n/a
Version 1.6.2
Storage Type raft
HA Enabled true
command terminated with exit code 2
[root@~]# kubectl exec -it vault-001-4 – vault status
Defaulting container name to vault.
Use ‘kubectl describe pod/vault-001-4 -n saas-platform-a1’ to see all of the containers in this pod.
Key Value
Seal Type shamir
Initialized false
Sealed true
Total Shares 0
Threshold 0
Unseal Progress 0/0
Unseal Nonce n/a
Version 1.6.2
Storage Type raft
HA Enabled true
command terminated with exit code 2
LOGS from old leader repeat this pattern
2022-08-25T00:56:54.239Z [INFO] storage.raft: entering candidate state: node=“Node at vault-001-2.vault-internal-001:8201 [Candidate]” term=14658
2022-08-25T00:56:54.242Z [ERROR] storage.raft: failed to make requestVote RPC: target=“{Voter faa8e3ff-3058-5f12-be3a-4e1e07798aab vault-001-0.vault-internal-001:8201}” error=“dial tcp 10.37.5.91:8201: connect: connection refused”
2022-08-25T00:56:54.244Z [ERROR] storage.raft: failed to make requestVote RPC: target=“{Voter 5df9b941-d066-8d2e-8b13-853a1d15ada3 vault-001-3.vault-internal-001:8201}” error=“dial tcp 10.37.3.29:8201: connect: connection refused”
2022-08-25T00:57:02.354Z [WARN] storage.raft: Election timeout reached, restarting election
I have attempted to drop the missing followers but not able to get a CLI connection to work.
Any suggestions are welcome and appreciated at this time.