"consul snapshot save" is proving to be unreliable

pcolmer · November 25, 2020, 9:01am

We have a 3-node cluster running consul and vault. I’ve got a script that runs every hour to take a snapshot of the consul data for disaster recovery.

Recently, the “consul snapshot save” command has been failing on an irregular basis. I upgraded consul at the weekend to version 1.8.6 and ensured that the script is communicating with consul running on the same host as itself, but we’re still getting errors like this:

Error verifying snapshot file: failed to read snapshot file: failed to read or write snapshot data: unexpected EOF

Does it matter that I’m trying to use “consul snapshot save” on a follower rather than the leader?

Is there anything else I should be checking to stop errors like this happening?

Wolfsrudel · November 25, 2020, 6:49pm

I think in your case you should use -stale:

rboyer · November 25, 2020, 8:24pm

Two useful things to know:

How big is the snapshot when it does successfully generate?
How long does it take to generate it?

pcolmer · November 26, 2020, 7:46am

Each snapshot is around 655M.

It took 2m 11s to create a snapshot.

Interestingly, using -stale caused it to take 3m 3s.

pcolmer · January 4, 2021, 7:57am

I’ve since discovered that our use of Vault has been resulting in a lot of AWS IAM auth tokens being generated but not revoked when finished with. This has caused the snapshots to grow to this size. Cleaning up the tokens has reduced the backup size down to single-digit MB files.

imyashvinder · March 3, 2021, 8:13am

Are you able to restore the snapshot to the newly enabled engine.
I am not able to restore the secrets from the snapshot to the newly enabled vault secret kv/ backend.
If i backup and restore to the same secret engine with the same accessor id, it works but it doesn’t work when everything is created from scratch and you are using the previously taken snapshot to restore secrets to this new vault consul stack (even if path and secret engine you are restoring to is same)

Topic		Replies	Views
1.7.4 snapshot issue Consul consul-snapshot	8	1935	July 15, 2020
Consul snapshot compaction Consul consul-snapshot	3	1143	October 8, 2021
Consul snapshot size reducing daily(from 1GB to 50 MB over the period of one month) Consul	2	875	May 18, 2020
Consul snapshot size increased in 1.10 Consul	2	842	October 22, 2021
Consul snapshot files Consul	5	1230	September 19, 2021

"consul snapshot save" is proving to be unreliable

Related topics