Error taking snapshot

Hello, I started to get the following error when running a backup script utilizing approle role-id and secret-id. This issue started recently and I am not sure what the problem is.

Error taking the snapshot: incomplete snapshot, unable to read SHA256SUMS.sealed file

when trying to take backup running command

vault operator raft snapshot save /var/lib/vault/snapshots/backup.snapshot

Sounds like the user you’re running the snapshot command from doesn’t have read access to the directory where vault is storing it’s data. Possibly it may not write write access to /var/lib/vault/snapshots.

Hi @rwilliams-devmon ,

This is a consequence of Add code to api.RaftSnapshot to detect incomplete snapshots by ncabatoff · Pull Request #12388 · hashicorp/vault · GitHub - you were probably already having these issues, but now we’re detecting them. Probably your autoseal is failing, maybe (guessing) because it’s transit and the token has expired?

The token for the backup is generated by an approle using a role-id and secret-id, so a new temp token gets created every time we run a backup. I verified that the token is generated and working correctly.

Although our transit auto-unseal is broken currently and the token is not being renewed after 32 days because the max ttl has been met. Usually, when this happens the vault cluster fails but it has not, thankfully, which is weird.

What is the best way to set up transit auto-unseal because I keep failing and whenever this happens our entire environment loses access to vault credentials? There use to be a tutorial on Hashicorp Learn that allowed you to go through the process. It is no longer there.

Thanks for the response, when looking at the storage location I see a new backup being stored in /var/lib/vault/snapshots.

If the token for the snapshot were the issue then the snapshot request would fail with a permissions error. The error you’re seeing is typically due to an autoseal issue - sounds like that’s the case here too.

Is this the tutorial you’re thinking of: Auto-unseal using Transit Secrets Engine | Vault - HashiCorp Learn ?

1 Like

Thanks for the response, I was referring to another tutorial that was on Hashicorp Learn but it looks like it is no longer available. Issuing a new auto-unseal token fixed the issue. Thanks for the help!

has this actually been resolved? I am getting the same error message on a test setup (5 nodes, integrated storage + external LB) it seems like out of 20 requests 4 succeed but the order appears to be random

Was facing the same issue, however I’m using auto-unseal with a HA and raft.

Per the comments above I tried to re-key vault per the instructions here but ran in to another error around expired secrets.

In resolving the issues with expired secrets I’ve discovered that simply bringing down each of the nodes and allowing them to spin back up and auto-unseal has resolved the issue - snapshot backup is now working as expected. The vault operator step-down was super helpful in bringing down the master.

As far as this error goes I think Vault has some work to do. I’ve noted 3 mechanisms for creating snapshots with only 1 highlighting the issue with the snapshot and failing, and the other 2 methods generating binaries that were corrupt which I only discovered when attempting to restore.

  1. vault cli - shows the error
  2. vault UI - creates broken binary
  3. curl - creates broken binary

Would hurt so bad to realise the backups you’ve been taking are corrupt when you go to restore!

Hi @andrew.klimovski ,

Could you file a github issue for case (2) please? I don’t know how to make things better for curl, but we should be able to improve the UI.

Going through the change notes it looks like this feature was rolled in to release v1.9.0, however my vault instance is on v1.8.X and CLI tool on v1.9.2 which is likely why we weren’t seeing any issues via the UI or curl. Once we’ve upgraded if we see the issue crop up again I’ll raise a ticket.

I was originally receiving this error due to one of the auto-unseal keys being expired.

Hmmm if your auto-unseal “token” expires, you’d end up with a sealed instance. That’s probably a bigger deal than your backup not running. :slight_smile: