Operator migration between different clusters, backends and versions

Hello!

We’re trying to migrate Vault

From:

5 replicas running on EC2 instance
Version: 0.6
Storage Backend: consul
UnSeal: standard

To

5 replicas running K8S
Version: 1.7
Storage backend: Raft
AutoUnseal: awskms

Basically, we are running into the pod vault-0 the following command, before initializing the cluster

vault operator migrate -config /tmp/migrate.hcl

Our migration.hcl file contains:

storage_source "consul" {
  address = "consul1.domain.consul:8500"
  path    = "vault"
}

storage_destination "raft" {
  path = "/vault/data"
}

cluster_addr = "http://127.0.0.1:8201"

After migrating we restart the vault pod (service) and run manually unseal with the -migrate flag and the keys for the source Vault cluster.

vault operator unseal -migrate

After that, running vault status we can see that all nodes are in standby status: running
Vault-0 status:

/ $ vault status
Key                      Value
---                      -----
Recovery Seal Type       shamir
Initialized              true
Sealed                   false
Total Recovery Shares    5
Threshold                3
Version                  1.7.2
Storage Type             raft
Cluster Name             vault-cluster-0c158e31
Cluster ID               c3efa7c7-a28f-c000-b427-85a20ff3a562
HA Enabled               true
HA Cluster               n/a
HA Mode                  standby
Active Node Address      <none>
Raft Committed Index     496844
Raft Applied Index       496844

Vault-1 to 4

Recovery Seal Type       shamir
Initialized              true
Sealed                   false
Total Recovery Shares    5
Threshold                3
Version                  1.7.2
Storage Type             raft
Cluster Name             vault-cluster-0c158e31
Cluster ID               c3efa7c7-a28f-c000-b427-85a20ff3a562
HA Enabled               true
HA Cluster               n/a
HA Mode                  standby
Active Node Address      <none>
Raft Committed Index     528192
Raft Applied Index       528019

Checking the logs we can see in the vault-0 (pod that has been used for the migration)

{"@level":"warn","@message":"heartbeat timeout reached, starting election","@module":"storage.raft","@timestamp":"2021-07-05T07:45:52.779220Z","last-leader":""}
{"@level":"info","@message":"entering candidate state","@module":"storage.raft","@timestamp":"2021-07-05T07:45:52.779290Z","node":{},"term":42802}
{"@level":"info","@message":"entering follower state","@module":"storage.raft","@timestamp":"2021-07-05T07:45:52.791225Z","follower":{},"leader":""}

Rest of the pods:

{"@level":"info","@message":"added peer, starting replication","@module":"storage.raft","@timestamp":"2021-07-05T07:50:21.234539Z","peer":"vault-0"}
{"@level":"info","@message":"added peer, starting replication","@module":"storage.raft","@timestamp":"2021-07-05T07:50:21.234552Z","peer":"vault-4"}
{"@level":"info","@message":"added peer, starting replication","@module":"storage.raft","@timestamp":"2021-07-05T07:50:21.234560Z","peer":"vault-2"}
{"@level":"info","@message":"added peer, starting replication","@module":"storage.raft","@timestamp":"2021-07-05T07:50:21.234568Z","peer":"vault-1"}
{"@level":"warn","@message":"appendEntries rejected, sending older logs","@module":"storage.raft","@timestamp":"2021-07-05T07:50:21.235187Z","next":528209,"peer":{"Suffrage":0,"ID":"vault-1","Address":"vault-1.vault-internal:8201"}}
{"@level":"info","@message":"pipelining replication","@module":"storage.raft","@timestamp":"2021-07-05T07:50:21.237092Z","peer":{"Suffrage":0,"ID":"vault-1","Address":"vault-1.vault-internal:8201"}}
{"@level":"info","@message":"entering follower state","@module":"storage.raft","@timestamp":"2021-07-05T07:50:21.237933Z","follower":{},"leader":""}
{"@level":"info","@message":"aborting pipeline replication","@module":"storage.raft","@timestamp":"2021-07-05T07:50:21.237960Z","peer":{"Suffrage":0,"ID":"vault-1","Address":"vault-1.vault-internal:8201"}}
{"@level":"warn","@message":"appendEntries rejected, sending older logs","@module":"storage.raft","@timestamp":"2021-07-05T07:50:21.292211Z","next":528209,"peer":{"Suffrage":0,"ID":"vault-4","Address":"vault-4.vault-internal:8201"}}
{"@level":"error","@message":"failed to acquire lock","@module":"core","@timestamp":"2021-07-05T07:50:24.659280Z","error":"node is not the leader"}
{"@level":"info","@message":"duplicate requestVote for same term","@module":"storage.raft","@timestamp":"2021-07-05T07:50:25.129314Z","term":42857}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:50:26.895511Z","from":"vault-4.vault-internal:8201","leader":"vault-1.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:50:33.193767Z","from":"vault-0.vault-internal:8201","leader":"vault-1.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:50:33.725071Z","from":"vault-2.vault-internal:8201","leader":"vault-1.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:50:39.396882Z","from":"vault-0.vault-internal:8201","leader":"vault-1.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:50:41.808753Z","from":"vault-2.vault-internal:8201","leader":"vault-1.vault-internal:8201"}
{"@level":"warn","@message":"failed to get previous log","@module":"storage.raft","@timestamp":"2021-07-05T07:50:42.000255Z","error":"log not found","last-index":528212,"previous-index":528213}
{"@level":"warn","@message":"heartbeat timeout reached, starting election","@module":"storage.raft","@timestamp":"2021-07-05T07:50:47.724557Z","last-leader":"vault-1.vault-internal:8201"}
{"@level":"info","@message":"entering candidate state","@module":"storage.raft","@timestamp":"2021-07-05T07:50:47.724594Z","node":{},"term":42863}
{"@level":"info","@message":"duplicate requestVote for same term","@module":"storage.raft","@timestamp":"2021-07-05T07:50:47.729118Z","term":42863}
{"@level":"info","@message":"duplicate requestVote for same term","@module":"storage.raft","@timestamp":"2021-07-05T07:50:47.729157Z","term":42863}
{"@level":"warn","@message":"duplicate requestVote from","@module":"storage.raft","@timestamp":"2021-07-05T07:50:47.729165Z","candidate":"vault-3.vault-internal:8201"}
{"@level":"info","@message":"entering follower state","@module":"storage.raft","@timestamp":"2021-07-05T07:50:48.240078Z","follower":{},"leader":""}
{"@level":"warn","@message":"failed to get previous log","@module":"storage.raft","@timestamp":"2021-07-05T07:50:48.240148Z","error":"log not found","last-index":528213,"previous-index":528214}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:50:49.338093Z","from":"vault-0.vault-internal:8201","leader":"vault-1.vault-internal:8201"}
{"@level":"warn","@message":"heartbeat timeout reached, starting election","@module":"storage.raft","@timestamp":"2021-07-05T07:50:54.311925Z","last-leader":"vault-1.vault-internal:8201"}
{"@level":"info","@message":"entering candidate state","@module":"storage.raft","@timestamp":"2021-07-05T07:50:54.311972Z","node":{},"term":42865}
{"@level":"info","@message":"duplicate requestVote for same term","@module":"storage.raft","@timestamp":"2021-07-05T07:50:54.316603Z","term":42865}
{"@level":"warn","@message":"duplicate requestVote from","@module":"storage.raft","@timestamp":"2021-07-05T07:50:54.316622Z","candidate":"vault-3.vault-internal:8201"}

or

"@level":"warn","@message":"heartbeat timeout reached, starting election","@module":"storage.raft","@timestamp":"2021-07-05T07:56:27.056883Z","last-leader":"vault-1.vault-internal:8201"}
{"@level":"info","@message":"entering candidate state","@module":"storage.raft","@timestamp":"2021-07-05T07:56:27.056923Z","node":{},"term":42931}
{"@level":"info","@message":"duplicate requestVote for same term","@module":"storage.raft","@timestamp":"2021-07-05T07:56:27.060760Z","term":42931}
{"@level":"warn","@message":"duplicate requestVote from","@module":"storage.raft","@timestamp":"2021-07-05T07:56:27.060780Z","candidate":"vault-3.vault-internal:8201"}
{"@level":"info","@message":"duplicate requestVote for same term","@module":"storage.raft","@timestamp":"2021-07-05T07:56:28.321427Z","term":42931}
{"@level":"info","@message":"duplicate requestVote for same term","@module":"storage.raft","@timestamp":"2021-07-05T07:56:30.487828Z","term":42931}
{"@level":"info","@message":"entering follower state","@module":"storage.raft","@timestamp":"2021-07-05T07:56:32.689767Z","follower":{},"leader":""}
{"@level":"warn","@message":"failed to get previous log","@module":"storage.raft","@timestamp":"2021-07-05T07:56:32.689824Z","error":"log not found","last-index":528262,"previous-index":528263}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:56:37.969606Z","from":"vault-4.vault-internal:8201","leader":"vault-1.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:56:38.338280Z","from":"vault-2.vault-internal:8201","leader":"vault-1.vault-internal:8201"}
{"@level":"warn","@message":"failed to get previous log","@module":"storage.raft","@timestamp":"2021-07-05T07:56:38.343219Z","error":"log not found","last-index":528262,"previous-index":528264}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:56:41.532682Z","from":"vault-0.vault-internal:8201","leader":"vault-2.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:56:43.568161Z","from":"vault-1.vault-internal:8201","leader":"vault-2.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:56:47.252086Z","from":"vault-0.vault-internal:8201","leader":"vault-1.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:56:49.702528Z","from":"vault-2.vault-internal:8201","leader":"vault-1.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:56:54.712520Z","from":"vault-0.vault-internal:8201","leader":"vault-2.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:57:00.320092Z","from":"vault-0.vault-internal:8201","leader":"vault-2.vault-internal:8201"}
{"@level":"warn","@message":"rejecting vote request since we have a leader","@module":"storage.raft","@timestamp":"2021-07-05T07:57:02.355605Z","from":"vault-4.vault-internal:8201","leader":"vault-2.vault-internal:8201"}

It seems that the nodes can’t reach a consensus.

Do you think that could be a problem related to the migration or the post-migration actions?

Any clue about how can fix it?

We detected the error.

In our migration.hcl we are using cluster_addr = "http://127.0.0.1:8201"

Therefore, the migration node keeps their address as localhost, and the followers can not connect to them