[SOLVED] Help: migrate from standalone file backend to HA raft

Hello,

Ive been doing a PoC to make vault HA using raft backend, and it’s pretty straightforward. My HA node configs are basically from this guide.

Now im trying to migrate the file backend data into my PoC but struggling.

My standalone config:

storage "file" {
  path = "/vault/file"
}
listener "tcp" {
  address     = "127.0.0.1:8200"
  tls_disable = "true"
}
disable_mlock = "true"
default_lease_ttl = "24h"
max_lease_ttl = "24h"
api_addr = "http://127.0.0.1"
cluster_addr = "http://127.0.0.1:8201"
ui = "true"

My procedure is:

  1. on standalone shutdown vault
  2. On standalone migrate
storage_source "file" {
  path = "/vault/file/"
}

storage_destination "raft" {
  path = "/vault/raft/"
}
cluster_addr = "http://127.0.0.1:8201"
  1. Copy raft files into HA PoC-vault directory
  2. start vault server on node1 (not initialized)
  3. Attempt to unseal using same key from standalone

It fails at point 5 with error: error unsealing: context deadline exceeded

Logs say:

storage.raft: not part of stable configuration, aborting election

Is this the proper procedure?
Ive tried all sorts of different approaches but still don’t understand why it fails.
Should all nodes in the cluster be active but unsealed? Should only the leader be alive and unseal first?

I want to say I’ve followed this closely but perhaps im missing something.

Any input is appreciated

Thank you,
Dave

I think I see where you’re going wrong. To explain, it is necessary to first discuss the concept of a “Raft configuration”. This is not something that exists in a configuration file - it is data maintained within the Raft binary files. It is a record of each server within a Raft cluster. For each server, the following three pieces of information are stored:

  • Node ID (a string identifier)
  • Node Address (a host:port string that other nodes in the cluster use to connect to this node)
  • A boolean indicating whether this node has a vote in cluster leadership - mentioned for completeness, not really relevant to this question.

Fundamentally I think your issue is that the Raft configuration created at the time you create the Raft data files with vault operator migrate fails to match what is expected/needed after you copy the files.

First, let’s address Node ID. There are lots of people on the internet, including HashiCorp tutorials, which give bad advice of defining the Raft node_id in the Vault configuration file. Do NOT do that, as then you become responsible for managing it. If you simply omit that setting, Vault will generate a random UUID to use as node ID, and store it in a file in the data directory. That way, so long as you always copy/move the data directory as a complete unit, you don’t have to worry about the Node ID in the runtime configuration differing from the persisted configuration that is a part of cluster state.

Second, Node Address. Vault sets the Raft node address to the host:port part of the URL configured as cluster_addr only when initially setting up Raft. That means the cluster_addr in your migration configuration file needs to be correct for the node where you will later copy the files to!

I think that should sort out your issues.

If you’re copying migrated Raft files in to initially seed a new cluster, none of the new nodes should be started, nor ever have been started.

  • Copy in your files to one node
  • Start that node
  • Unseal that node
  • Start the other nodes
  • If using retry_join, they will fetch preliminary join information from the unsealed active node. If not using retry_join, use vault operator raft join <leader info> to trigger fetching preliminary join information.
  • Unseal the other nodes; joining will not be complete until they are unsealed

Lastly, a couple of other minor points:

Once you have a cluster up and running, use vault operator raft list-peers to view the active Raft configuration.

When specifying a cluster_addr, the scheme is always https://. It doesn’t actually matter if you get it wrong, because Vault will ignore you anyway and override it to https://, but it’s nicer to not be wrong in the first place.

1 Like

@maxb
Thank you very much for the detailed response!

removing the node_id from my HA configs was the issue.

After i copied my migrated raft files over to node1, i was able to unseal it with the original key as well as add the other nodes to the cluster.

I will add a note about my original migrate.hcl:

When i set cluster_addr = "https://127.0.0.1:8201" and when i moved to my HA cluster i noticed that node1 on my peer-lists appeared as so. Which looked wrong to me.

Node                                    Address                State       Voter
----                                    -------                -----       -----
0a9975e0-cda2-5cd7-c45d-aa10c852c95c    127.0.0.1:8201         leader      true
7e1f0c39-82aa-33c1-ee26-81ee090f19ae    node2-fqdn:8201        follower    true
cc11e5f7-a542-4149-fe4f-59c801fda479    node3-fqdn:8201        follower    true

So i changed it to cluster_addr = "https://node1-fqdn:8201" and now peer-lists looks better:

Node                                    Address                State       Voter
----                                    -------                -----       -----
0a9975e0-cda2-5cd7-c45d-aa10c852c95c    node1-fqdn:8201        leader      true
7e1f0c39-82aa-33c1-ee26-81ee090f19ae    node2-fqdn:8201        follower    true
cc11e5f7-a542-4149-fe4f-59c801fda479    node3-fqdn:8201        follower    true

Thank you again!
Very happy i got this going

Best,
Dave