Failed to read packed storage bucket entry

Hello,

We use Vault version 1.8.1 with Okta auth method. Vault deployed to GKE cluster.
We have more than 100 unique users, but only one has a login problem.

Authentication failed: failed to read packed storage bucket entry: 
failed to read value for "logical/ee6c668d-a9f9-7d16-5858-a13af9f026c3/packer/buckets/230": 
Get "https://storage.googleapis.com/vault/logical/ee6c668d-a9f9-7d16-5858-a13af9f026c3/packer/buckets/230": context canceled

I found this error in sources vault/storagepacker.go at v1.8.1 · hashicorp/vault · GitHub
but I don’t understand what it means.
Maybe someone can explain what this means or help solve the problem in some way?

That’s long out of maintenance, please consider upgrading.

Anyway…

Vault storage consists of entries - which in your case are stored as blobs in Google storage.

The Vault identity system stores multiple entity (user) records packed together into buckets, to strike a compromise between putting all users in one storage blob (it would be huge) and every user in its own storage blob (there would be far too many).

For some unknown reason, Vault seems to be having problems retrieving the specific bucket mentioned in your error message from the Google storage API.

context cancelled means that the Vault code gave up waiting for the operation to complete - usually because of timeout.

I would suggest the next debugging step would be to check whether this particular blob can actually be retrieved from the Google storage API, manually. Also, to compare the size of it with other blobs in the same directory of storage.

Thanks for the explanation.

Yes, I can retrieve this file and it has a similar size to other files in this folder.
One more thing I want to add. We have a test environment with the same config in another GCP project with the same error just only for this particular user.

Huh… I’m at a loss as to what might cause that.

I suppose it’s good that it’s kind of reproducible, but if it’s only reproducible for you, it’s difficult to get outside help with the problem.

I’m just throwing out ideas here, but maybe reconfigure the test environment to use the Vault file storage backend instead of Google storage - just to try to eliminate variables from the problem?

I think it may be related to our auth method, I mean Okta. This user was previously using one domain and moved to a new domain not too long ago.

That’s definitely interesting… although I’m still completely confused how it could cause an error so many layers deep in the code, that it doesn’t get detected until the storage layer.

Unfortunately I’m a bit out of ideas how to help further.

If you’re completely stuck and just need to try something - anything - to see if it shakes the problem loose, you could try deleting the Vault entity corresponding to this user.

The next time the user logs in, Vault would create a new Vault entity, which would have a different ID, which would have a 255/256 chance of ending up in a different storage bucket … I have no idea if this would actually help - I’m concerned it probably won’t, considering the issue is reproduced in multiple environments - but it’s the only other thing I can think of, other than a detailed hands-on investigation into exactly what is happening.

I will discuss your idea with my colleagues. In any case, thank you very much for the information