Consul snap restore on a new cluster resulted in "cannot mount under existing mount" during post-unseal

Overview of the Issue

A new consul cluster (v 0.8.1) was built to replace current Consul cluster (v 0.8.1). Took the consul snap backup from current cluster and restored it onto the new cluster. During the post-unseal process, the following was observed :

[DEBUG]	core:	shutting down leader elections			
[DEBUG]	core:	finished triggering standbyStopCh for runStandby			
[DEBUG]	core:	runStandby done			
[DEBUG]	core:	sealing barrier			
[INFO]	core:	vault is sealed			
[INFO]	core:	vault is unsealed			
[INFO]	core:	entering standby mode			
[INFO]	core:	acquired lock, enabling active operation			
[DEBUG]	core:	generating cluster private key			
[DEBUG]	core:	generating local cluster certificate			
[INFO]	core:	post-unseal setup starting			
[DEBUG]	core:	clearing forwarding clients			
[DEBUG]	core:	done clearing forwarding clients			
[INFO]	core:	loaded wrapping token key			
[INFO]	core:	successfully setup plugin catalog: plugin-directory=			
[INFO]	core:	successfully mounted backend: type=generic path=secret/			
[INFO]	core:	successfully mounted backend: type=system path=sys/			
[INFO]	core:	successfully mounted backend: type=identity path=identi	ty/		
[INFO]	core:	successfully mounted backend: type=pki path=pki/			
[INFO]	core:	successfully mounted backend: type=cubbyhole path=cubby	hole/		
[INFO]	core:	successfully mounted backend: type=pki path=my/ldap/pki/		
[ERROR]	core:	failed to mount entry: path=my/ldap/pki/ error="cannot	mount under existing mount "my/ldap/pki/""		
[INFO]	core:	pre-seal teardown starting			
[INFO]	core:	cluster listeners not running			
[INFO]	core:	pre-seal teardown complete		

It then goes into a loop and remains in status 429 (standby) :

$ consul operator raft list-peers
Node                               ID                 Address            State     Voter
my-consul-node1  x.x.x.x:8300  x.x.x.x:8300  leader    true
my-consul-node2   x.x.x.x:8300   x.x.x.x:8300   follower  true
my-consul-node3   x.x.x.x:8300  x.x.x.x:8300  follower  true

$ vault status
Key                    Value
---                    -----
Seal Type              shamir
Initialized            true
Sealed                 false
Total Shares           1
Threshold              1
Version                1.0.3
Cluster Name           vault-cluster-624c7ebf
Cluster ID             8e2ac018-3d9e-1e84-e4da-76146b54bf83
HA Enabled             true
HA Cluster             n/a
HA Mode                standby
Active Node Address    <none>

The current cluster does not show a duplication of mounts :

$ vault secrets list
Path             Type         Accessor              Description
----             ----         --------              -----------
cubbyhole/       cubbyhole    cubbyhole_295119ab    per-token private secret storage
identity/        identity     identity_275d4e5e     identity store
pki/             pki          pki_e71d366f          n/a
secret/          generic      generic_2c3dc747      generic secret storage
my/ldap/pki/    pki          pki_97dff7b7          n/a
my/ldap2/pki/     pki          pki_fd1cbf54          n/a
sys/             system       system_71e16012       system endpoints used for control, policy and debugging

A side note : I also tried building a new cluster with the latest consul version (1.7.2) and restoring the snapshot with the exact same steps as described below, and I received exactly the same issue. I am doing this activity in preparation of upgrading our current Vault/Consul infrastructure (which is very old!) to the latest versions.

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create a new cluster with the same version of consul (0.8.1) and vault (1.0.3)
  2. Initialize and unseal new cluster with the new keys
  3. Get a copy of consul snap backup from current cluster (consul.snap.2021-06-15_1200), onto the new cluster
  4. Run a snapshot restore:
    consul snapshot restore consul.snap.2021-06-15_1200
  5. Delete core lock
    consul kv delete vault/core/lock
  6. Unseal the vault with the current cluster’s master key:
    vault operator unseal
  7. Watch the log as the mounting operation fails

Consul info for both Client and Server

	check_monitors = 0
	check_ttls = 1
	checks = 1
	services = 2
	prerelease = 
	revision = 'e9ca44d
	version = 0.8.1
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = x.x.x.x:8300
	server = true
	applied_index = 20661759
	commit_index = 20661759
	fsm_pending = 0
	last_contact = 5.359637ms
	last_log_index = 20661760
	last_log_term = 2
	last_snapshot_index = 20658044
	last_snapshot_term = 2
	latest_configuration = [{Suffrage:Voter ID:x.x.x.x:8300 Address:x.x.x.x:8300} {Suffrage:Voter ID:x.x.x.x:8300 Address:x.x.x.x:8300} {Suffrage:Voter ID:x.x.x.x:8300 Address:x.x.x.x:8300}]
	latest_configuration_index = 1
	num_peers = 2
	protocol_version = 2
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 2
	arch = amd64
	cpu_count = 2
	goroutines = 72
	max_procs = 2
	os = linux
	version = go1.8.1
	encrypted = true
	event_queue = 0
	event_time = 2
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 3
	members = 3
	query_queue = 0
	query_time = 2
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 8
	members = 3
	query_queue = 0
	query_time = 2

Operating system and Environment details


Hi @suzana.zahri, its not possible to upgrade directly from such an old version to a newer Consul version. You need to perform a multi-phased upgrade. The steps for upgrading from 0.8.x can be found here Upgrade Instructions | Consul by HashiCorp.

Thanks so much for your update @blake ! Sorry I meant to say that the new cluster built did have the same version as the current (v 0.8.1).
For what its worth, I did inititally build a cluster with the higher version (v1.7.2) than that of the current cluster.
There, I tried to restore the snapshot backup, but it was erroring out with exactly the same issue during the post unseal phase. So then I rebuilt the new cluster with v.0.8.1 in a bid to see if I’d run into the same issue (which I did).

Now I’m desperately looking at options - possibly looking at using vault operator migrate, migrating from consul to file backend, which I then can try to restore to my new cluster? I’m not super familiar with this method, perhaps it is possible to migrate from current consul to new consul using the following:

storage_source "consul" {
  address = ""
  path    = "vault"
storage_destination "consul" {
  address = "<new-consul-ip-address>:8500"
  path    = "vault"

Is this possible?

operator migrate - Command | Vault by HashiCorp

Anyways - I am dumbfounded as to why the mounts are failing from restore on the exact same version of Consul. This would be my ideal restore solution, but its just not working right now :frowning: