We use Vault version 1.8.1 with Okta auth method. Vault deployed to GKE cluster.
We have more than 100 unique users, but only one has a login problem.
Authentication failed: failed to read packed storage bucket entry:
failed to read value for "logical/ee6c668d-a9f9-7d16-5858-a13af9f026c3/packer/buckets/230":
Get "https://storage.googleapis.com/vault/logical/ee6c668d-a9f9-7d16-5858-a13af9f026c3/packer/buckets/230": context canceled
That’s long out of maintenance, please consider upgrading.
Anyway…
Vault storage consists of entries - which in your case are stored as blobs in Google storage.
The Vault identity system stores multiple entity (user) records packed together into buckets, to strike a compromise between putting all users in one storage blob (it would be huge) and every user in its own storage blob (there would be far too many).
For some unknown reason, Vault seems to be having problems retrieving the specific bucket mentioned in your error message from the Google storage API.
context cancelled means that the Vault code gave up waiting for the operation to complete - usually because of timeout.
I would suggest the next debugging step would be to check whether this particular blob can actually be retrieved from the Google storage API, manually. Also, to compare the size of it with other blobs in the same directory of storage.
Yes, I can retrieve this file and it has a similar size to other files in this folder.
One more thing I want to add. We have a test environment with the same config in another GCP project with the same error just only for this particular user.
I suppose it’s good that it’s kind of reproducible, but if it’s only reproducible for you, it’s difficult to get outside help with the problem.
I’m just throwing out ideas here, but maybe reconfigure the test environment to use the Vault file storage backend instead of Google storage - just to try to eliminate variables from the problem?
That’s definitely interesting… although I’m still completely confused how it could cause an error so many layers deep in the code, that it doesn’t get detected until the storage layer.
Unfortunately I’m a bit out of ideas how to help further.
If you’re completely stuck and just need to try something - anything - to see if it shakes the problem loose, you could try deleting the Vault entity corresponding to this user.
The next time the user logs in, Vault would create a new Vault entity, which would have a different ID, which would have a 255/256 chance of ending up in a different storage bucket … I have no idea if this would actually help - I’m concerned it probably won’t, considering the issue is reproduced in multiple environments - but it’s the only other thing I can think of, other than a detailed hands-on investigation into exactly what is happening.