Migration backend consul to raft fail

Hello

I try to migrate to raft backend, the old backend is consul

We have 3 servers with HA with tls

the first node is ok, all is up with raft

but when the 2nd node boot they are always errors and join the cluster with:
state voter

follower false

in 2nd node :
core.cluster-listener: no TLS config found for ALPN: ALPN=[“raft_storage_v1”]

in first:
May 28 23:29:06 01-node vault[127235]: 2022-05-28T23:29:06.049+0200 [ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter 02-node 02-node:8201}" error=“remote error: tls: internal error”

sometime when i restart the migration:
Error unsealing: context deadline exceeded

We tryed with tls option in section raft but same error
We tryed with other cluster but in http, all worked

vault version 1.10.3
config:
cluster_name = “vault”
max_lease_ttl = “768h”
default_lease_ttl = “768h”

disable_clustering = “False”
cluster_addr = “https://01-node:8201
api_addr = “https://01-node:8200

plugin_directory = “/usr/local/lib/vault/plugins”

listener “tcp” {
address = “XX.XX.XX.XX:8200”
cluster_address = “XX.XX.XX.XX:8201”
tls_cert_file = “/etc/vault.d/ssl/vault.crt”
tls_key_file = “/etc/vault.d/ssl/vault.key”
tls_disable = “false”
proxy_protocol_authorized_addrs = “XX.XX.XX.XX”
proxy_protocol_behavior = “allow_authorized”
x_forwarded_for_authorized_addrs = “XX.XX.XX.XX”
x_forwarded_for_reject_not_authorized = “false”
x_forwarded_for_reject_not_present = “false”
}

storage “raft” {
path = “/data/vault/raft/”
node_id = “01-node”

retry_join {
leader_api_addr = “https://01-node:8200
#leader_client_cert_file = “/etc/vault.d/ssl/vault.crt”
#leader_client_key_file = “/etc/vault.d/ssl/vault.key”
#leader_ca_cert_file = “/etc/vault.d/ssl/vault_cacert.pem”
#leader_tls_servername = “01-node”
}
retry_join {
leader_api_addr = “https://02-node:8200
#leader_client_cert_file = “/etc/vault.d/ssl/vault.crt”
#leader_client_key_file = “/etc/vault.d/ssl/vault.key”
#leader_ca_cert_file = “/etc/vault.d/ssl/vault_cacert.pem”
#leader_tls_servername = “02-node”
}
retry_join {
leader_api_addr = “https://03-node:8200
#leader_client_cert_file = “/etc/vault.d/ssl/vault.crt”
#leader_client_key_file = “/etc/vault.d/ssl/vault.key”
#leader_ca_cert_file = “/etc/vault.d/ssl/vault_cacert.pem”
#leader_tls_servername = “03-node”
}
}

#backend “consul” {
#address = “01-node:8501”
#path = “vault”
#service = “vault”
#token = “XXXXXXXXXXXXX”
#scheme = “https”
#}

ui = true
disable_mlock = true

I’m confused. Are you migrating from one cluster to another cluster or doing an inplace upgrade? What is 2nd node here? You migrate clusters not nodes.

What does your migrate.hcl look like? Are you using the proper consul addr, Vault should be shutdown during this process, so I’m not sure how you’re getting TLS errors.

When I have done this in the past, I just did it at the consul leader node so that there wouldn’t be any connectivity issues.

Hi @aram, thanks for your help,

This is a inplace upgrade, we would migrate consul backend to raft
we have 2 vault clusters with 3 nodes, they are all consul backend
One cluster is only for dev usage and is http, with this cluster i have apply the procedure for migrate to raft and all is good

the 2nd cluster is configurer with tls and my probleme is here, for migrate the backend of this cluster, apply the procedure on first node is ok, is boot and vault unseal ok
but the node02 and node03 they are probleme with tls mentioned above

my consul backend run in same node of vault

This is my migration hcl

storage_source “consul” {
address = “01-node:8501”
path = “vault”
service = “vault”
token = “XXXXXXXX”
scheme = “https”
}

storage_destination “raft” {
path = “/data/vault/raft/”
node_id = “01-node”
}

cluster_addr = “https://01-node:8201

for node02 and node03 , i change only the configuration with new backend raft and retry_join

on the other hand I did not pay attention if my node 01 was the leader during migration
it can be a problem ? because after migration my node 01 is auto take the lead

This is from the learn guide:

Perform the migration step on one of the nodes in the cluster which will become the leader node.

There is no need to do the migration again with any other nodes. The leader node will replicate itself to any nodes that come online after the start. You only need to do the migration once. Also I “think” in the past I have used 127.0.0.1 (for in place upgrades) and that worked – otherwise I would point it at the consul cluster LB or leader, not a specific random node (shouldn’t make a difference but the leader is guaranteed to be the latest copy – belt & suspenders).

Double check the node_id you’re using – it’s probably a cut-n-paste thing but you’re using different names.

The path and node_id must match the values you set in the server configuration file.

Lastly, if you’re worried about the other nodes joining as standby then you can join them manually first so they get the copy from the leader you choose then switch to a configuration file.

node2: vault operator raft join https://leader_node:8200

@aram yes that’s exactly what i did, except the 127.0.0.1 i put the same params of my configuration in migration.hcl for cluster_addr so is the fqdn of the node

the node_id is correct in all node, because when the node join the cluster, i see in list-peers
but is join with non voter and tls error is present in logs, the leader work fine , but the follower no and if in restart follower nodes, unseal is not possible , because they are this error of communication tls
I not inderstand why, communication tls between node worked with consul backend

Okay then you don’t have a migration problem you have a TLS problem. Please start a new thread with just that information and the errors and your configuration, your TLS cert chain information.