Hello, I started to get the following error when running a backup script utilizing approle role-id and secret-id. This issue started recently and I am not sure what the problem is.
Error taking the snapshot: incomplete snapshot, unable to read SHA256SUMS.sealed file
when trying to take backup running command
vault operator raft snapshot save /var/lib/vault/snapshots/backup.snapshot
Sounds like the user you’re running the snapshot command from doesn’t have read access to the directory where vault is storing it’s data. Possibly it may not write write access to /var/lib/vault/snapshots.
The token for the backup is generated by an approle using a role-id and secret-id, so a new temp token gets created every time we run a backup. I verified that the token is generated and working correctly.
Although our transit auto-unseal is broken currently and the token is not being renewed after 32 days because the max ttl has been met. Usually, when this happens the vault cluster fails but it has not, thankfully, which is weird.
What is the best way to set up transit auto-unseal because I keep failing and whenever this happens our entire environment loses access to vault credentials? There use to be a tutorial on Hashicorp Learn that allowed you to go through the process. It is no longer there.
If the token for the snapshot were the issue then the snapshot request would fail with a permissions error. The error you’re seeing is typically due to an autoseal issue - sounds like that’s the case here too.
Thanks for the response, I was referring to another tutorial that was on Hashicorp Learn but it looks like it is no longer available. Issuing a new auto-unseal token fixed the issue. Thanks for the help!
has this actually been resolved? I am getting the same error message on a test setup (5 nodes, integrated storage + external LB) it seems like out of 20 requests 4 succeed but the order appears to be random
Was facing the same issue, however I’m using auto-unseal with a HA and raft.
Per the comments above I tried to re-key vault per the instructions here but ran in to another error around expired secrets.
In resolving the issues with expired secrets I’ve discovered that simply bringing down each of the nodes and allowing them to spin back up and auto-unseal has resolved the issue - snapshot backup is now working as expected. The vault operator step-down was super helpful in bringing down the master.
As far as this error goes I think Vault has some work to do. I’ve noted 3 mechanisms for creating snapshots with only 1 highlighting the issue with the snapshot and failing, and the other 2 methods generating binaries that were corrupt which I only discovered when attempting to restore.
vault cli - shows the error
vault UI - creates broken binary
curl - creates broken binary
Would hurt so bad to realise the backups you’ve been taking are corrupt when you go to restore!
Going through the change notes it looks like this feature was rolled in to release v1.9.0, however my vault instance is on v1.8.X and CLI tool on v1.9.2 which is likely why we weren’t seeing any issues via the UI or curl. Once we’ve upgraded if we see the issue crop up again I’ll raise a ticket.
Just was about to confirm it, when i run snapshot save on standby nodes, it says exactly Error taking the snapshot: incomplete snapshot, unable to read SHA256SUMS.sealed file, but when i run it on leader, it works as it should.
Error taking the snapshot: Error making API request.
URL: GET http://127.0.0.1:8200/v1/sys/storage/raft/snapshot
Code: 403. Errors:
* permission denied
but all other I get
Error taking the snapshot: incomplete snapshot, unable to read SHA256SUMS.sealed file
I would say, that you are using token, with wrong permissions to create snapshots on leader node regarding
Error taking the snapshot: Error making API request.
URL: GET http://127.0.0.1:8200/v1/sys/storage/raft/snapshot
Code: 403. Errors:
* permission denied
and regarding
Error taking the snapshot: incomplete snapshot, unable to read SHA256SUMS.sealed file
This seems to me, that this is default error message, when you are trying to create snapshot on stand-by node (not the leader node).