I’m trying to migrate a Vault cluster off of AWS EC2 instances and onto EKS pods. While the current EC2 cluster runs fine and I’m able to spin up a new cluster in our EKS namespace without issue, trying to migrate the former onto the latter has been giving me some trouble.
Both our current, EC2-based Vault cluster and the new EKS one use Raft for storage and AWS KMS for auto-unsealing. My process so far has been taking a snapshot of the EC2 cluster, spinning down the instance, and then spinning up the EKS namespace (currently working with just one pod for simplicity’s sake; will have more once the migration is complete), initializing a new Vault instance on the EKS pod, then restoring the Raft snapshot over the new instance.
This all works as expected, and I am able to get the snapshot to restore the old cluster onto the EKS pod successfully and can even reach the restored cluster’s GUI at the expected URL. However, after unsealing the restored cluster, all other work fails with:
"Error authenticating: error looking up token: Error making API request.
URL: GET http://127.0.0.1:8200/v1/auth/token/lookup-self
Code: 500. Errors:
- local node not active but active cluster node not found"
I can run the “vault status” command against the restored cluster, which confirms the cluster was restored successfully, but nothing else. The EKS pod is considered a standby node in the restored cluster, but with no active node I can’t make any other progress.
Key Value
--- -----
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 1
Threshold 1
Version 1.15.2
Build Date 2023-11-06T11:33:28Z
Storage Type raft
Cluster Name {restored cluster name}
Cluster ID {restored cluster id}
HA Enabled true
HA Cluster n/a
HA Mode standby
Active Node Address <none>
Raft Committed Index 2216365
Raft Applied Index 2216365
How can I ensure that the EKS pod I’m restoring the cluster onto becomes the active node?