Issue to configure vault nodes

Hi,

I already checked some post but I never managed to make my vault network working so here am I asking for your help.

I’m trying to create a vault network using raft storage with 2 vaults nodes.

Here is the config file of my vault both, of my machine have the same config file except taht

storage "raft" {
  path    = "./vault/data"
  node_id = "node1" ( or node 2 for the second vault)
}
listener "tcp"{
  address ="0.0.0.0:8200"
  tls_disable="true"
}
api_addr = "https://local ip of this machine :8200"
cluster_addr = "https://local ip of this machine :8201"
ui = true
disable_mlock=true

Okay everything is good I do

 sudo vault server -config=/etc/vault.d/vault.hcl

on both machine
then I do this on my first node :

export VAULT_ADDR=http://local ip of node 1:8200
then vault operator init

Everything is fine from there then on my second node I do :

export VAULT_ADDR=http://local ip of my second node:8200
vault operator raft join "http://public ip of my first node :8200

( note that vault operator raft join "http://private ip of my first node :8200 )

doesn’t work either I have always this message error :
Error joining the node to the Raft cluster: Error making API request.

URL: POST http://public ip of my first node :8200/v1/sys/storage/raft/join
Code: 500. Errors:

  • failed to join raft cluster: failed to join any raft leader node

the logs of the error are :

2022-03-04T14:04:31.978Z [WARN]  core: join attempt failed: error="error during raft bootstrap init call: Put \"http://**public ip ofmy first node** :8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp **public ip of my first node** :8200: i/o timeout"
2022-03-04T14:04:31.978Z [ERROR] core: failed to join raft cluster: error="failed to join any raft leader node"

Do you have any idea about my mistake ?

Ok I have some news I understood just now that I had to unseal both of my vault to try to use the command :

vault operator raft join

So okay I just did the unseal commands on both of my vault but now I have a another issue :
When I do

vault operator raft join "http:public ip of my first vault":8200

on my second node I have this :
Key Value


Joined true

But when I check on my first vault by using

vault operator raft list-peers

I can’t see my second vault in the list, seems like they can’t talk to each other

The first issue is in your first statement. You shouldn’t set raft up with even number of nodes.

The second issue is that you should add a retry_join stanza to your config that tells the nodes where the other nodes are. If you’re using a cloud provider this is easier as you can build filters to find the other nodes dynamically otherwise you have to statically assign the IP addresses.

In addition I see, in the configuration file of node 1 - addresses are mentioned with HTTPS but HTTP is used while joining the cluster. It may not work.
In fact, it should be HTTP in the node 1 confi file as you have not mentioned any certificate information there.

Okay I checked the way that raft works and I understand the reason to use an odd numbers of nodes, thanks. I will also look for this stanza thanks.

Oh yeah this detail might be an answer tho thanks I will check this.

Also I’m confused about something, in the " retry_join" stanza should I put the private IP’s of the other nodes or the public IP’s. Because in the config file for the api_addr I put the private IP like in the other tutorial.

Also when I’m doing vault operator join what IP should I use , the public one or the private one ?

Thanks in advance for your help.

For the try I did this morning I used this config file for both node

storage "raft" {
  path    = "./vault/data"
  node_id = "node1"
retry_join{
leader_api_addr="http://public ip of the other node:8200"
}
}
listener "tcp"{
address ="0.0.0.0:8200"
tls_disable="true"
}
api_addr = "http://public ip of this node:8200"
cluster_addr = "http://public ip of this node:8201"
ui = true
disable_mlock=true

Then did

sudo vault server -config=/etc/vault.d/vault.hcl

On both nodes.

Then on node 1 :

export VAULT_ADDR=http://private ip of this node:8200
Vault operator init
vault unseal ( first one )
vault unseal (second one )
vault unseal (third one)
vault login ( with root token)

Then on node 2

export VAULT_ADDR=http://private ip of this node:8200
vault operator join "http://public ip of my first node"

Note that in the logs text I can see that my node is trying to join my first vault but it doesn’t work :

Then if I unseal my second node :

vault operator init
vault unseal ( first one with the unseal key gave on node 2)
vault unseal ( second one with the unseal key gave on node 2)
vault unseal ( third one with the unseal key gave on node 2)
vault login (with root token)
vault operator raft join "http://public ip of my first node:8200"
Key       Value
---       -----
Joined    true

Then if i check on node 1 if my second node joined I have this :

vault operator raft list-peers
Node     Address                                     State     Voter
----     -------                -----     -----
node1    public ip of this node:8201    leader    true

I’m trying to explain you in details my procedure to make it easier to find my mistake

It’s easier to think “who is going to talk to this node, and how” when determining what IP address to use in what variable in the configuration.

A) Clients, out in the world want to talk to “All” of Vault. This would be the “public” ip address of the whole cluster… most likely a load balancer that will distribute the connections to all of your nodes.

B) Other Vault nodes talking “raft” over to other vault nodes. This would be the private “routeable” ip address. This should be an ip address that is still routable between the nodes but does not need to be a public one.

C) If you ever end up with an enterprise license and want do to DR, then you need to have an ip address that is publicly routable, so again this would be a public IP address – BUT the IP address that the DR uses must point to the “leader” node and not a load balancer that can go to any node. The combination of IP+port in this case lets you use the load balancer (via health check) to point to the leader only.

Regarding the second part of your post… Unless you’re using kubernetes each node does not need to unseal. You setup your nodes and they communicate and join and then you initialize once, and unseal once. If you check the status of the second node and it’s sealed (and you have already initialized and unsealed the first node) then there is a communication error, the node is joining the same cluster so no need to unseal.

First thanks a lot for taking time answering my question.

I tried after your explanation to put the private ip in the config file :

For first node :

storage "raft" {
  path    = "./vault/data"
  node_id = "node1"
  retry_join{
    leader_api_addr="http://private ip of second node:8200"
  }
}
listener "tcp"{
  address ="0.0.0.0:8200"
  tls_disable="true"
}
api_addr = "http://private ip of this node 1:8200"
cluster_addr = "http://private ip of this node:8201"
ui = true
disable_mlock=true

For second node :

storage "raft" {
  path    = "./vault/data"
  node_id = "node1"
  retry_join{
    leader_api_addr="http://private ip of first node:8200"
  }
}
listener "tcp"{
  address ="0.0.0.0:8200"
  tls_disable="true"
}
api_addr = "http://private ip of this node 1:8200"
cluster_addr = "http://private ip of this node:8201"
ui = true
disable_mlock=true

Then running

sudo vault server -config=/etc/vault.d/vault.hcl

On both nodes.

Then on first node I did the whole unseal procedure :

export VAULT_ADDR="http://private ip of first node:8200"
vault operator unseal
vault unseal ( three times )
vault login 

At the en of the procedure I can see when I run :

vault operator raft list-peers

Node Address State Voter


node1 private Ip of node 1 :8201 leader true

Then on node 2 I always have an error message with the auto-join :

2022-03-08T13:48:51.351Z [INFO]  core: security barrier not initialized
2022-03-08T13:48:51.351Z [INFO]  core: attempting to join possible raft leader node: leader_addr=http://PRIVATE IP OF NODE 1 :8200
2022-03-08T13:49:21.352Z [WARN]  core: join attempt failed: error="error during raft bootstrap init call: Put \"http://PRIVATE IP OF NODE 1:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp PRIVATE IP OF NODE 1:8200: i/o timeout"

It’s pretty wierd and even when I try to do it manualy I have this error message :

 export VAULT_ADDR="http://PRIVATE IP OF NODE 2 "
 vault operator raft join "http://PRIVATE IP OF NODE 1:8200"
Error joining the node to the Raft cluster: Error making API request.

URL: POST http://PRIVATE IP OF NODE 2 :8200/v1/sys/storage/raft/join
Code: 500. Errors:

* failed to join raft cluster: failed to join any raft leader node

Notice that it’s pretty wierd because in the error we can see that the URL POST uses the private ip of seconde node.

Thus even if don’t init the first node I have on both logs the same error messages that tells me that they can’t join each other.
So yeah from now I don’t really understand what is the problem with my procedure :frowning:

This says that the two nodes can’t communicate. There is a network/connection issue. You should be able to run on the 2nd node:

curl http://<ip of the first node:8200>

and get a respond back. If not (which is what that error is saying) the two nodes can’t talk to each other.

Just to point out that this isn’t your private ip, this is all interfaces usually.

Ok well I just checked my network configuration and my VM had issues as you explained to tlak to each other, it as was as simple as that thanks a lot for your help, I’m a beginner and I’m trying to learn news applications and services so I’m doing basic mistakes. Thanks a lot for your help and your time. Now it’s working