We are currently in the process of migrating our vault clusters from a consul backend to a dynamodb backend.
When we migrated our lab environment using the operator migrate tool the process took around 2 hours.
These paths took the longest time:
sys/expire/id/auth
sys/token/accessor
auth/token
we had about 132000 entries there.
We are now looking for a way to migrate our production cluster without this long downtime.
As far as I know our options are:
Reduce the ttl for tokens and wait for the tokens to be revoked (currently its set to 768h)
Run the operator migrate tool online - is this even possible?
Since this is against best practice, why would you do this?
You didn’t mention the size of production vs the lab so it’s hard to say what the timing would be.
Yes that’s the best option. You can also go through your history and revoke any outstanding leases.
If you’re running enterprise, you can setup a DR cluster, that’ll do it online. It wouldn’t be faster than doing the migration though.
It’s possible, however the amount of time you would have to use up to come up with work around would probably equal any savings you may have. Not sure if it’s worth the savings – also if anything goes wrong and you have to fall back and redo the task you just doubled your migration time. IMHO it isn’t worth it … revoke what you can and reduce the size of the database is the better time savings.
First thanks for the reply.
Our production cluster has about 96000 entries under sys/expire/id/auth.
We cannot afford a downtime longer than 30 minutes for our production cluster.
How would I revoke any outstanding leases? is there a documentation for that?
regarding your answer on 3. won’t a rollback will be to just change the vault configurations back to our consul db?
We are also in the same boat.Is there any cmd to get the number of entries in etcd and dynamo?This will help to validate all the data is copied.
If there is any other way to validate data copy,please let me know.