Vault backend migration from Consul to Raft (K8S, Helm, KMS auto-unseal)

Hi everyone.

Currently we have a Kubernetes cluster running on AWS EKS, with a Vault OSS cluster running on it, using Consul as storage backend. Vault was installed and it’s managed with Helm chart, and auto-unsealed with AWS KMS.

Now we need to migrate the Consul backend to Integrated Storage with Raft. As far as I know, there is no specific documentation about this procedure and honestly I have hundreds of doubts.

Has anyone done this before that can help? How can I migrate the backend and then reconcile the Helm configuration?

I have not done this before, but your question is interesting to me, as I think I might need to do something similar in the future…

As you have noticed, whilst Helm & Kubernetes make the initial deployment quite easy, complex operations thereafter can be made harder…

First, I tried helm upgrade from a Consul-storage deployment to a Raft-storage deployment, just to see what would happen…

Error: UPGRADE FAILED: cannot patch "vault" with kind StatefulSet: StatefulSet.apps "vault" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

It seems the problem is the Helm chart wants to change volumeClaimTemplates so that PVCs are generated for the Vault StatefulSet … which makes sense … but also isn’t supported by Kubernetes.

Oh well… Vault storage migration requires downtime anyway, so the fact we have to delete the StatefulSet to replace it isn’t really making things worse.

Where things start getting trickier, is that we need somewhere to run the storage migration… meaning we need somewhere with access to the mounted Persistent Volume, whilst Vault is not running…

Trying to put all these contraints together, I came up with the following rough draft of a migration plan…

But before that - in the middle of the procedure, we’re going to need a way to actually run the migration, and when we do, we need:

  • Access to the Vault CLI binary
  • Access to the data volume
  • Vault server to NOT be running

I can’t see any way to make that happen using the existing server pods, since if you kill the server process, the pod will terminate.

That means we need to make our own “maintenance” pod definition, and if we’re using Helm anyway, we might as well create the “maintenance” pod using it too.

So… make sure you’re using a local copy of the Vault Helm chart so you can easily make modifications, and copy the templates/server-statefulset.yaml file to set up a new statefulset that will define our optional “maintenance” pod:

  • The metadata.name will need to be different to distinguish it, as will the spec.serviceName (add a suffix -maint?)
  • component: server will need to change to something like component: maintenance to set it apart (both cases)
  • Various other optional parts of the YAML might be applicable only to the running servers and not a maintenance pod, depending on what you have configured
  • The readinessProbe, livenessProbe, lifecycle, and template rendering the volumeClaimTemplates are not wanted for a maintenance pod
  • But we need to add in an explicit mention of the volume we want to mount instead to the volumes section:
        - name: data
          persistentVolumeClaim:
            claimName: data-vault-0
  • As well as deleting the args and changing the command so we run a dummy command instead of starting a real Vault server:
          command:
          - /usr/local/bin/docker-entrypoint.sh
          - sleep
          - 999d
  • And we’ll set spec.replicas to 0 so that we only have a maintenance pod when we manually scale up this StatefulSet.

With all of that prepared …

  1. Schedule planned downtime in advance
  2. Scale the Vault StatefulSet to zero replicas (the Vault service is now offline)
  3. Consider taking a backup just to be safe… although we’ll be leaving the old Consul pods in existence so in a way they are a backup themselves.
  4. Manually kubectl delete the StatefulSet, since we need to replace it
  5. helm upgrade the Vault chart to values that specify Raft storage
  6. Scale the new Vault StatefulSet to zero replicas, because once it has initialised the volumes, we need Vault not running to do the storage migration
  7. Scale the maintenance StatefulSet to 1
  8. kubectl exec -it podname -- sh into the maintenance pod
  9. In the maintenance pod interactive session, create a configuration file for vault operator migrate, and run the migration… but maybe before you start, wipe the initial contents of /vault/data/ created when the server pod initially started up and created a new initial database?
  10. Scale the maintenance StatefulSet to 0, and the main StatefulSet back to your desired number of replicas
  11. Depending on the details of your configuration, it’s possible all your replicas find each other and replicate the migrated data to the other nodes, or perhaps some executions of vault operator raft join are needed.

Not at all tested in full! I was pretty far into “thought experiment” territory by the end of typing all that. But, hopefully it’s a decent source of inspiration if you want to work through productionising something based on this.

It occurs to me that since volumeClaimTemplates can’t be easily updated, resizing the data volume later will be a hassle. Better make sure to plan suitable sizing carefully.

It’s interesting the way we’ve got to almost the same conclusions, except that you have a better idea about re-defining the statefulset completly.

I already have a lab environment where I will be able to test your suggestion. I’ll keep you posted about the results.

Thanks!