No active nodes after upgrading Vault to a newer version

awnumar · February 16, 2023, 2:57pm

We have a Vault cluster running v1.2.7 running in HA mode. It’s backed by both S3 and etcd.

We’re upgrading it to 1.12.3 by creating a new empty cluster (backed by separate, empty s3 and etcd backends) and then migrating the S3 backend using the vault operator migrate. After this completes, we unseal each node in the new cluster

Problem: none of the new nodes become the leader

They each have:

Initialized            true
Sealed                 false
...
Version                1.12.3
Storage Type           s3
...
HA Enabled             true
HA Cluster             n/a
HA Mode                standby
Active Node Address    <none>

but no errors are reported in the journald logs:

Feb 16 14:33:21 vault[1915310]: ==> Vault server started! Log data will stream in below:
Feb 16 14:33:21 vault[1915310]: 2023-02-16T14:33:20.433Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
Feb 16 14:33:21 vault[1915310]: 2023-02-16T14:33:21.852Z [INFO]  core: Initializing version history cache for core
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8301
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]>
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]>
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core: vault is unsealed
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core: entering standby mode

We did not migrate the etcd backend initially since it stores the lock files for the old cluster, but after realising that we couldn’t elect a leader we also performed a migration of the etcd backend:

root@ip-10-102-36-152:~# vault operator migrate -config migrate-etcd3.hcl
2023-02-16T14:32:48.829Z [INFO]  copied key: path=core/lock/2b808287a2fcb5e1
2023-02-16T14:32:48.833Z [INFO]  copied key: path=core/lock/2b808287a2fcb5f9
2023-02-16T14:32:48.837Z [INFO]  copied key: path=core/lock/2b808287a2fcbde5
2023-02-16T14:32:48.842Z [INFO]  copied key: path=core/lock/2b808287a2fcbe18
2023-02-16T14:32:48.845Z [INFO]  copied key: path=core/lock/2f737e2ffe2959d2
2023-02-16T14:32:48.849Z [INFO]  copied key: path=core/lock/2f737e2ffe29631c
Success! All of the keys have been migrated.

this had no effect and we’re still unable to elect a leader. Anyone have any ideas?

awnumar · February 16, 2023, 4:25pm

Turns out this was caused by the new Vault cluster talking to the wrong etcd cluster.

Topic		Replies	Views
[Solved] Raft HA: All nodes in standby modes Vault	3	1499	October 13, 2020
HA local node not active but active cluster node not found Vault	4	6397	October 13, 2023
Error with RAFT Integrated HA storage: local node not active but active cluster node not found Vault	3	3924	March 17, 2022
Issue on a leader in ha cluster Vault Vault	1	721	August 2, 2023
Stuck Creating a HA Cluster - "local node not active, active cluster node not found" Vault	12	4437	June 6, 2024

No active nodes after upgrading Vault to a newer version

Related topics