Vault operator raft join always returns true but isn't actually peering nodes

I’m new to vault and I’m trying to set up a cluster of 3 vault nodes using raft as storage. I’ve messed with the config quite a bit but can’t seem to get the nodes to “peer” each other.

Vault configuration /etc/vault.d/vault.hcl

storage "raft" {
  path    = "/opt/raft"
  node_id = "raft_node_<node # (1, 2, or 3)>"
}

listener "tcp" {
  address     = "0.0.0.0:8200"
  tls_disable = true
}

api_addr     = "http://<IP address of this node>:8200"
cluster_addr = "https://<IP address of this node>:8201"
ui           = true
  1. Start the server on each of the 3 nodes
    sudo vault server -config=/etc/vault.d/vault.hcl -log-level=trace

  2. Set vault address on each node to be node 1’s IP:8200
    export VAULT_ADDR=http://<node 1's IP>:8200

  3. Init vault cluster on node 1
    vault operator init

  4. Try to join node 1 from nodes 2 and 3
    vault operator raft join "http://<node 1's IP>:8200"

    Note: This command pretty much always returns “Joined true” even if I plug in some random value like vault operator raft join asdfasdf

  5. Run vault operator unseal once on node 2 and 3 (threshold is 3)

  6. Do step 4 again just in case

  7. Do the final unseal on node 1

  8. Set VAULT_TOKEN and run vault operator raft list-peers
    Actual output:

    Node           Address               State     Voter
    ----           -------               -----     -----
    raft_node_1    <Node 1's IP>:8201    leader    true
    

    Expected output:

    Node           Address               State       Voter
    ----           -------               -----       -----
    raft_node_1    <Node 1's IP>:8201    leader      true
    raft_node_2    <Node 2's IP>:8201    follower    true
    raft_node_3    <Node 3's IP>:8201    follower    true
    

Also I’m able to store secrets on one node and pull them down on another node even though they’re not listed as peers.

These are the docs I’ve been following

Is anyone able to see where I’m going wrong here?

Hello,

Yes, you are correct, the Joined true does not indicates successfully joined node, when the vault unseal ... is successfully executed, that would indicate successful join to the cluster.

I would suggest having your log-level set to TRACE and observe the message in the operational logs at the time when you do raft join... on node 2 and node 3, this would reveal the error reason for not joining.

Martin

Okay that’s what I thought.

I ran all the servers with log-level=trace and I got no output on nodes 2 and 3 when unsealing – they did not change at all after being started. Here is the output from node 1 during the unseal process

Hello,

The log shared from node 1 does not seem to provide any useful information.
Can you share the TRACE logs from node 2 and node 3 when you execute raft join... command.

Martin

Yeah sure


One the left are the trace server logs of node 2 and on the right are the commands I ran on the node 2. From when I started the server they did not change at all.

Here’s the return from raft list-peers after trying to join the cluster

Hello,

I can see that node 2 says still says “Sealed: true” after you do “vault operator unseal”. A raft node is successfully joined to the cluster when it is unsealed correctly after joining.

What kind of “seal” stanza do you use. If auto-unseal is being used, are the unseal keys the same on node 1 and node 2?

Martin

Hey Martin,

So I have no seal stanza definition so I’m using the default Shamir seal. When I init on vault node 1 I just copy those unseal keys to the other nodes.

The vault still says sealed after I ran vault operator unseal because the unseal progress was only at 2/3. After I unsealed on this node I finished up on the rest and later ran the 'vault operator raft list-peers` command you see in the last screenshot.

Hello,

Vault needs to unseal successfully in order to join the cluster. I can see that you have 3 Shamir keys, what do you see when you enter 3 keys. Here are example steps :

  • Execute vault operator raft join... on node 2 to join it to node 1
  • Execute 3 (three times) vault operator unseal UNSEAL_KEY, all times use different unseal key.

Do you see an error during the unseal process?

Martin

Hello again Martin!

Sorry I was out all weekend but here is the log output from completing these steps:

  1. Start server in trace mode on all three vault servers
  2. export VAULT_ADDR=<node 1's IP>:8200 on all three servers
  3. vault operator init on server 1
  4. vault operator raft join "http://<node 1's IP>:8200" on node 2
  5. I then ran vault operator unseal 3 times on node 2 (with different unseal keys each time)
  6. At this point the unseal process was done so I set VAULT_TOKEN on node 2 and ran vault operator raft list-peers and got the same result, only the leader (node 1) was listed

The very first line of this log output that says “core: pre-seal teardown complete” is the very last line of the output generated by vault operator initthe rest of the output is from unsealing the vault

Note: I don’t see output in any of the servers when I run vault operator raft join "http://<node 1's IP>:8200"

Thanks again, Carter

Hello,

This export VAULT_ADDR=<node 1's IP>:8200 on all three servers means that all of your commands are executed only on node 1.
For node2 - export VAULT_ADDR=localhost:8200, for node3 - export VAULT_ADDR=localhost:8200.

VAULT_ADDR variables specifies at which hosts your CLI commands are executed, more info here.

Hi Martin!

It’s working! I looked through all sorts of docs but I guess I never understood that the VAULT_ADDR var was supposed to be the local machines IP.

I also thought that each node only had to contribute to a single unseal process (each node did it at least one time) and didn’t understand that each node had to do its own entire unseal process (me having the wrong VUALT_ADDR definitely didn’t help with this either lol).

Thank you so much for all your help!!
- Carter

Hello,

I’m glad i was able to help, wish you all the best !

Martin

1 Like