We currently have Vault running on an AWS EKS cluster (deployed via Helm by Flux) which uses Raft and has dataStorage as well as auditStorage set to gp2.
We’re looking to migrate towards gp3 but a bit trepidatious on what approach to use. Zero data loss will be of paramount importance, while downtime can be somewhat tolerated – we can announce a maintenance period.
Given the constraints, would it be possible to, say:
- scale down the Vault StatefulSet to 0
- change the volume type to gp3 on AWS control panel
- change the HelmRelease values to gp3
- scale the StatefulSet back up
Or do we have to migrate the volumes like in this Medium article, and then just simply switch to gp3 in the HelmRelease?
Or can we simply switch to gp3 in the HelmRelease and everything will be taken care of?
Or do we have to resort to creating a new cluster and use vault operator migrate ?
Or is there some esoteric approach that is undocumented as of yet?
Thank you for any and all help.
For reference, the following is our HelmRelease manifest:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: vault
spec:
releaseName: vault
interval: 5m
chart:
spec:
chart: vault
sourceRef:
kind: HelmRepository
name: hashicorp
values:
injector:
resources:
requests:
memory: 16Mi
cpu: 10m
limits:
memory: 128Mi
cpu: 100m
server:
volumes:
- name: userconfig
emptyDir: {}
volumeMounts:
- mountPath: /vault/userconfig/myscript
name: userconfig
readOnly: false
postStart:
- /bin/sh
- -c
- /vault/userconfig/myscript/post-start.sh
extraInitContainers:
- name: post-start
image: "alpine"
command: [sh, -c]
env:
- name: UNSEAL_KEY
valueFrom:
secretKeyRef:
name: vault-env-secret
key: UNSEAL_KEY
args:
- |-
cd /tmp &&
echo "#!/bin/sh" >> post-start.sh &&
echo "sleep 30" >> post-start.sh &&
echo "vault operator unseal $UNSEAL_KEY" >> post-start.sh &&
mv post-start.sh /vault/userconfig/myscript/post-start.sh &&
chmod +x /vault/userconfig/myscript/post-start.sh
volumeMounts:
- name: userconfig
mountPath: /vault/userconfig/myscript
resources:
requests:
memory: 256Mi
cpu: 50m
limits:
memory: 256Mi
cpu: 100m
readinessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
livenessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true"
initialDelaySeconds: 60
auditStorage:
enabled: true
storageClass: "gp2"
dataStorage:
enabled: true
storageClass: "gp2"
standalone:
enabled: false
ha:
enabled: true
replicas: 4
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "http://vault-0.vault-internal:8200"
}
retry_join {
leader_api_addr = "http://vault-1.vault-internal:8200"
}
retry_join {
leader_api_addr = "http://vault-2.vault-internal:8200"
}
retry_join {
leader_api_addr = "http://vault-3.vault-internal:8200"
}
autopilot {
cleanup_dead_servers = "true"
last_contact_threshold = "200ms"
last_contact_failure_threshold = "10m"
max_trailing_logs = 250000
min_quorum = 2
server_stabilization_time = "10s"
}
}
service_registration "kubernetes" {}
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: {{ template "vault.name" . }}
app.kubernetes.io/instance: "{{ .Release.Name }}"
component: server
topologyKey: kubernetes.io/hostname