Vault operator migrate keeps failing with `context canceled`

maxenced · August 18, 2023, 7:41am

Hi,

I’m trying to migrate a gcs storage to raft. For this, I’m using a copy of the real bucket and a standalone (up-to-date) vault instance (the next step would be to create a raft export and reimport it in real cluster) from a compute inside same GCP project.

No matter which options I set (especially tested max-parallel from 10 to 3000) the migration always fails after ~ 6.30minutes, and always on different versions of the same secret .

I’ve checked the versions and they do not look big (~ 400 bytes).

The error I get is : Error migrating: failed to scan for children: failed to read object: context canceled

maxenced · August 18, 2023, 7:43am

Also, I tried to restart with the -start option but always getting an error saying cluster already has configuraiton

maxb · August 18, 2023, 7:34pm

It is telling you that a request to GCS took longer than Vault was willing to wait.

The only thing I can think of is to look for timeout settings to make Vault willing to wait for longer … however I cannot see any obviously in the documentation, and I have not worked with GCS personally.

It might be interesting to see if you can replicate the slow operation outside of Vault, using a GCS command line client, to see if it is extremely slow in isolation.

maxenced · August 21, 2023, 7:47am

I already tried to play with VAULT_CLIENT_TIMEOUT without success. The strange thing is that it happens almost at the same place : If I log (log level info) the synced files, it always crash after ~ 59600 files (10 tests, smallest was 59547, highest 59664), no matter how long it took / how many threads are in use.

I can’t reproduce it (yet ?) with gcloud cli. Will try to sync the bucket locally and use the file backend.

maxenced · August 22, 2023, 7:45am

So file backend does not use the exact same structure than gcs backend, back to start

maxenced · August 22, 2023, 1:36pm

Some more details, I tried to switch to file backend for destination as the -start seems to work for it.

If I start without -start it works for some time but fails after ~ 37.5k files synced. If I then try to restart with -start set to the last sync item, it will takes ~4 minutes 20 seconds on step creating client then FAILS with Error migrating: failed to scan for children: failed to read object: context canceled without syncing anyting (or syncing only 2/3 files).

maxenced · August 22, 2023, 1:47pm

Raised an issue here : `migrate` from gcs backend is broken (context canceled / timeout) · Issue #22493 · hashicorp/vault · GitHub

Topic		Replies	Views
"context canceled" Vault vault , consul-vault	8	3747	August 23, 2021
Context cancelled in migrating from file to raft storage Vault	2	90	September 11, 2024
Migration backend consul to raft fail Vault raft , vault	5	617	May 30, 2022
Failed to read packed storage bucket entry Vault	6	287	April 20, 2023
Error closing connection: context canceled Vault	0	335	December 14, 2020

Vault operator migrate keeps failing with `context canceled`

Related topics