Vault upgrade failing for 1.6.5

I am trying to upgrade vault to 1.6.5. my current version is 1.5.6. I have three node HA configurations using raft.

Steps performed

  1. Node - 1 - upgrade to 1.6.5
  2. vault unseal done
  3. vault operator raft join ( this is failing with following error):

Error joining the node to the Raft cluster: Error making API request.
Code: 500. Errors:

  • failed to join raft cluster: failed to join any raft leader node

When I revert the vault version to 1.5.6 this is not issue and things works fine. any pointers would be help here. we are facing this issue to upgrade to 1.6.5

Whats your upgrade plan/order?
You should upgrade the 2 standbys first and bring them back online, then step down the active and finally upgrade that.

If both updated instances are not part of cluster will it join / become active once i shutdown master node ?

I can see error in master node logs when I update standby nodes :slight_smile:
[ERROR] ha.raft: failed to heartbeat to: peer=a.vault.xxxxxxx:8201 error=“remote error: tls: internal error”

No. Those nodes aren’t part of a cluster so how would they know to do anything? Maybe I misunderstand here… but if you have a 3 node cluster that means all those nodes should show up with list-peers

on upgraded node if we do vault operator raft lits-peers we see error :

Error reading the raft cluster configuration: Error making API request.
URL: GET https://b.vault.XXXXXX:8200/v1/sys/storage/raft/configuration
Code: 500. Errors:

  • local node not active but active cluster node not found

vault operator raft join from updated node i see following error
Error joining the node to the Raft cluster: Error making API request.

URL: POST https://b.vault.XXXXXXX:8200/v1/sys/storage/raft/join

Code: 500. Errors:

  • failed to join raft cluster: failed to join any raft leader node

Same time we see following error on Master node
Jun 10 19:22:54 hashicorp-vault-*** vault[2197]: 2021-06-10T19:22:54.565Z [ERROR] ha.raft: failed to appendEntries to: peer="{Voter b.vault.XXXXX b.vault.XXXXX:8201}" error=“remote error: tls: internal error”.

any suggestions on this , seems 1.6.5 upgrade is not working.

I have no clue unless you can answer the first question I asked…
You also are seeing a tls error which can mean a few things. I’d recommend you post your architecture/cluster layout, config files, and logs.

I can see this error only when I do manually unseal and then raft join command.
If I do the auto unseal and then raft join I don’t see this error and things works fine and cluster upgrade works. its three node cluster with s3 as backend storage and raft is used for ha_storage. logs are already provided above.

config.hcl :
default_lease_ttl = “168h”
max_lease_ttl = “87600h”
ui = false

api_addr = “https://a.xxx.xx.xxx.com:8200
cluster_addr = “https://a.xxx.xx.xxx.com:8201

listener “tcp” {
address = “0.0.0.0:8200”
cluster_address = “0.0.0.0:8201”
tls_disable = false
tls_cert_file = “/etc/vault/tls/fullchain.crt”
tls_key_file = “/etc/vault/tls/private.key”
tls_min_version = “tls12”}

storage “s3” {
region = “xx-xx-xxx”
bucket = “xxx-xx-xx-xx-xx”
}

ha_storage “raft” {
path = “/opt/vault/ha”
node_id = “a.xxx.xx.xxx.com
}

telemetry {
statsd_address = “127.0.0.1:9125”
disable_hostname = true
}

do you need anything else to check ?

If you have autounseal currently, you should not be manually unsealing.
Nor should you have to do any raft operator commands in an upgrade.

Take a look here:

we have different clusters and due to some limitation , few of the clusters we do manual unseal and where problem persist.