I’m writing to ask for help in improving the custom plugin upgrade process for our Kubernetes StatefulSet running Vault.
Our current setup is as follows:
- We have developed our own plugins for Vault.
- We have 3 replicas of the Vault pod in the StatefulSet with “RollingUpdate” strategy.
- When a pod starts running, it checks in its init container if it has a new plugin version and, if so, it upgrades the plugin by registering its checksum.
- The main pod container just run Vault server.
One of the possible upgrade scenarios is as follows:
- A new Vault image is updated in the StatefulSet.
- The Vault-2 pod restarts. It was the leader pod. Now Vault-1 is selected to be the leader pod.
- Vault-2 finds that it’s running with a new plugin version that is different from the currently registered version.
- Vault-2 registers the new plugin version.
- Vault-2 starts running the main container with the new vault version and enters standby mode.
- Vault-1 restarts. Vault-0 becomes the active pod and the leader.
- Vault-0 cannot start running the plugin because it has the old binary that doesn’t match the new registered checksum.
- Vault-1 starts running the new vault version and enters standby mode.
- Vault-0 restarts. Vault-2 is selected to be the leader pod.
- Vault-2 starts running the new plugin version.
In this scenario, there is a downtime from step 4 to step 10 because the leader pod can’t serve requests to the plugin (checksums does not match). It can be up to 2 minutes. This is the worst-case scenario. Sometimes, Vault-2 is immediately selected as the leader, in which case there is almost no downtime.
I’m wondering how we can improve the worst-case scenario to decrease the downtime.
Thank you in advance
PS I found that sometimes a request to the leader pod that runs an old plugin version can succeed and sometimes the same request fails with the error message
failed to run existence check (checksums did not match)
What determines whether the request succeeds or fails?