3 minutes after bootup and unseal and then every hour we see this error in the server log of vault:
[ERROR] rollback: error rolling back: path=customer-keys/
error=
| 75013 errors occurred:
| \t* failed to read value for "logical/<uuid>/policy/0001": Get "https://storage.googleapis.com/<bucket-name>/logical/<uuid>/policy/0001": context deadline exceeded
...
To reproduce the issue you need to create a transit backend and add 5000 keys.
Reboot server and unseal. Wait 3 minutes and the error should be logged.
Vault versions 1.10.0 to 1.14.2
Vault version 1.9.10 does not have the issue.
Detailed steps to reproduce the bug can be found here:
opened 08:09PM - 08 Oct 23 UTC
We face the issue in our dev and production environments.
To reproduce the issu… e with fresh vault I tested some versions locally on a MacBook with empty storage backend gcs.
I found that version 1.9.4 and 1.9.10 do not have the issue.
With 1.10.0 and eg 1.11.1 or 11.14.4 the error can be reproduced.
So I believe this bug was introduced in 1.10.0 and has never been fixed in higher versions.
**Bug description**
3-5 mins after startup of the server and then every hour we see in the log:
```
[ERROR] rollback: error rolling back: path=customer-keys/
error=
| 75013 errors occurred:
| \t* failed to read value for "logical/<uuid>/policy/0001": Get "https://storage.googleapis.com/<bucket-name>/logical/<uuid>/policy/0001": context deadline exceeded
...
```
**To Reproduce**
1. Install vault locally, eg `brew install vault`
2. Create a config file `vault-config.hcl` with following contents, use a nice bucket name:
```
disable_mlock = true
listener "tcp" {
address = "127.0.0.1:8200"
tls_disable = "true"
}
storage "gcs" {
bucket = "mycompany-myproject-vault-gcs-test"
ha_enabled = "true"
}
api_addr = "http://127.0.0.1:8200"
```
3. Create the GCS bucket:
```
gsutil mb -p my-gcp-project -l europe-west3 -c standard gs://mycompany-myproject-vault-gcs-test
```
4. Start server: `vault server -config vault-config.hcl`
5. 2nd terminal:
```
export VAULT_ADDR='http://127.0.0.1:8200'
vault operator init -key-shares=1 -key-threshold=1 > init_response
cat init_response | grep 'Unseal Key 1:' | sed 's/Unseal Key 1: //' > unseal_key
cat init_response | grep 'Initial Root Token:' | sed 's/Initial Root Token: //' > root_token
vault operator unseal $(cat unseal_key)
vault login $(cat root_token)
vault secrets enable -path=customer-keys -force-no-cache=true transit
```
6. Create 5000 keys with a shell script `add_keys.sh`:
```
#!/usr/bin/env bash
total=50
for c in `seq -w 0 $((total-1))`; do
for i in `seq -w 0 99`; do
vault write -f customer-keys/keys/$c$i >/dev/null &
done >/dev/null 2>&1
wait
echo $((1$c+1-100))00/${total}00
done
```
```
chmod +x add_keys.sh
./add_keys.sh
# count written keys, 4000 keys or more is enough to reproduce the issue
vault list -format=yaml customer-keys/keys | wc -l
```
7. Stop server in 1st terminal with CTRL+C, start server in 1st terminal,
`vault server -config vault-config.hcl 2>&1 | grep -v "failed to read value"`
The piped grep statement removes the trace lines from the error msg to better see the first line of the error msg.
8. In 2nd terminal: `vault operator unseal $(cat unseal_key)` (**Important step! Do not forget this unseal step!**)
9. Wait 3-5 minutes
10. See error in server log in 1st terminal
```
[ERROR] rollback: error rolling back: path=customer-keys/
```
11. Cleanup:
- Stop server in 1st terminal with CTRL+C
- Delete GCS bucket `gsutil -m rm -r gs://mycompany-myproject-vault-gcs-test 2>/dev/null`
**Environment:**
* Vault Server Version 1.10.0 to 1.14.4
* Vault CLI Version 1.10.0 to 1.14.4
**Additional context**
This error does not happen when keycount is low, eg 1000.
This error does not happen with versions smaller than 1.10.0.
Raft storage does not produce the error.
When creating the transit backend the option `-force-no-cache=true` can be omitted, the error is reproducible also without this option.
**Questions**
What does the rollback manager do at startup and every hour?
Is this error critical? What are the consequences of this error?
Can/should we rollback production environment from 1.12.x to 1.9.x?
Can someone reproduce and fix the error?
Thanks in advance,
Craftey
**Note:**
To install older vault versions with brew I did:
```
curl https://raw.githubusercontent.com/Homebrew/homebrew-core/a0ce0e6ce3c921a26db90dfe8c38b4df9f227669/Formula/vault.rb > /tmp/vault.rb # version 1.10.0
brew reinstall --formula /tmp/vault.rb
```
Hashes of other versions can be found here: [vault.rb history](https://github.com/Homebrew/homebrew-core/commits/master?path%5B%5D=Formula&path%5B%5D=vault.rb)
What does the rollback manager do at startup and every hour?
Is this error critical? What are the consequences of this error?
Can/should we rollback production environment from 1.12.x to 1.9.x?
Can someone reproduce and fix the error?
Thanks in advance,
Craftey