I’m trying to migrate a gcs storage to raft. For this, I’m using a copy of the real bucket and a standalone (up-to-date) vault instance (the next step would be to create a raft export and reimport it in real cluster) from a compute inside same GCP project.
No matter which options I set (especially tested max-parallel from 10 to 3000) the migration always fails after ~ 6.30minutes, and always on different versions of the same secret .
I’ve checked the versions and they do not look big (~ 400 bytes).
The error I get is : Error migrating: failed to scan for children: failed to read object: context canceled
It is telling you that a request to GCS took longer than Vault was willing to wait.
The only thing I can think of is to look for timeout settings to make Vault willing to wait for longer … however I cannot see any obviously in the documentation, and I have not worked with GCS personally.
It might be interesting to see if you can replicate the slow operation outside of Vault, using a GCS command line client, to see if it is extremely slow in isolation.
I already tried to play with VAULT_CLIENT_TIMEOUT without success. The strange thing is that it happens almost at the same place : If I log (log level info) the synced files, it always crash after ~ 59600 files (10 tests, smallest was 59547, highest 59664), no matter how long it took / how many threads are in use.
I can’t reproduce it (yet ?) with gcloud cli. Will try to sync the bucket locally and use the file backend.
Some more details, I tried to switch to file backend for destination as the -start seems to work for it.
If I start without -start it works for some time but fails after ~ 37.5k files synced. If I then try to restart with -start set to the last sync item, it will takes ~4 minutes 20 seconds on step creating client then FAILS with Error migrating: failed to scan for children: failed to read object: context canceled without syncing anyting (or syncing only 2/3 files).