No active nodes after upgrading Vault to a newer version

We have a Vault cluster running v1.2.7 running in HA mode. It’s backed by both S3 and etcd.

We’re upgrading it to 1.12.3 by creating a new empty cluster (backed by separate, empty s3 and etcd backends) and then migrating the S3 backend using the vault operator migrate. After this completes, we unseal each node in the new cluster

Problem: none of the new nodes become the leader

They each have:

Initialized            true
Sealed                 false
...
Version                1.12.3
Storage Type           s3
...
HA Enabled             true
HA Cluster             n/a
HA Mode                standby
Active Node Address    <none>

but no errors are reported in the journald logs:

Feb 16 14:33:21 vault[1915310]: ==> Vault server started! Log data will stream in below:
Feb 16 14:33:21 vault[1915310]: 2023-02-16T14:33:20.433Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
Feb 16 14:33:21 vault[1915310]: 2023-02-16T14:33:21.852Z [INFO]  core: Initializing version history cache for core
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8301
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]>
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]>
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core: vault is unsealed
Feb 16 14:33:47 vault[1915310]: 2023-02-16T14:33:47.651Z [INFO]  core: entering standby mode

We did not migrate the etcd backend initially since it stores the lock files for the old cluster, but after realising that we couldn’t elect a leader we also performed a migration of the etcd backend:

root@ip-10-102-36-152:~# vault operator migrate -config migrate-etcd3.hcl
2023-02-16T14:32:48.829Z [INFO]  copied key: path=core/lock/2b808287a2fcb5e1
2023-02-16T14:32:48.833Z [INFO]  copied key: path=core/lock/2b808287a2fcb5f9
2023-02-16T14:32:48.837Z [INFO]  copied key: path=core/lock/2b808287a2fcbde5
2023-02-16T14:32:48.842Z [INFO]  copied key: path=core/lock/2b808287a2fcbe18
2023-02-16T14:32:48.845Z [INFO]  copied key: path=core/lock/2f737e2ffe2959d2
2023-02-16T14:32:48.849Z [INFO]  copied key: path=core/lock/2f737e2ffe29631c
Success! All of the keys have been migrated.

this had no effect and we’re still unable to elect a leader. Anyone have any ideas?

Turns out this was caused by the new Vault cluster talking to the wrong etcd cluster.