Error while reselecting leader due to alpn

Hi,

We recently faced an issue where our cluster which was running fine suddenly went into failed state and was not able to communicate with each other for raft storage , Normally we see these lines in healthy cluster also but it gets connected automatically . In this case it kept repeating the last error line which indicates no tls config found for ALPN.

It got resolved as soon we restarted our vault service on all the three nodes of our cluster.

We saw that others are also facing this , we are currently using Vault v1.7.1 ?

Please let us know if this is a bug or something else . Let me know if i need to paste my configuration here ?

Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: error during forwarded RPC request: error=“rpc error: code = Unavailable desc = transport is closing”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: forward request error: error=“error during forwarding RPC request”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: error during forwarded RPC request: error=“rpc error: code = Unavailable desc = transport is closing”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: forward request error: error=“error during forwarding RPC request”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: error during forwarded RPC request: error=“rpc error: code = Unavailable desc = transport is closing”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: forward request error: error=“error during forwarding RPC request”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.725+0530 [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.734+0530 [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
Feb 28 02:05:56 vault: 2023-02-28T02:05:56.728+0530 [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
Feb 28 02:05:56 vault: 2023-02-28T02:05:56.738+0530 [WARN] core.cluster-listener: no TLS config found fo

I think you have the cause and effect backwards - the ALPN warnings are simply a symptom of the nodes disagreeing with each other about leadership, not the cause.

Effectively, one node is trying to forward requests to another, and the destination node is saying “You think I’m the leader but no, I’m not!”.

I think it is unlikely that it will be possible to identify a root cause based on the limited logging available - but, I would mention that Vault 1.7.x is an unreasonably old version to still be running these days, particularly when running Integrated Storage, for which a great many bugfixes have been made over recent releases.

Thank you, we will plan to upgrade but the issue was too random and it might take us some time as we would need to plan this upgrade all the way to production.

Also, it seems that users are still facing this in 1.12.0 as seen in the below thread: -

We are facing same issue for our vault cluster. We are using Vault v1.15.2 with raft storage in HA with three nodes. Do we need to open a issue with Hashicorp against this? Even after restarting the vault cluster and allowing standby node to elect a leader node. This issue still persists.
Leader node logs:
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.079Z [INFO] storage.raft: election won: term=32 tally=2
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.079Z [INFO] storage.raft: entering leader state: leader=“Node at vault-secrets-server-2.vault-secrets-server-internal:8201 [Leader]”
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.079Z [INFO] storage.raft: added peer, starting replication: peer=eecd350a-f2a7-403f-9702-619ba8cebe40
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.079Z [INFO] storage.raft: added peer, starting replication: peer=9b285315-469d-d3c1-be5f-9ba1a41f6760
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.081Z [INFO] storage.raft: pipelining replication: peer=“{Voter 9b285315-469d-d3c1-be5f-9ba1a41f6760 vault-secrets-server-1.vault-secrets-server-internal:8201}”
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.081Z [INFO] storage.raft: pipelining replication: peer=“{Voter eecd350a-f2a7-403f-9702-619ba8cebe40 vault-secrets-server-0.vault-secrets-server-internal:8201}”
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.095Z [INFO] core: acquired lock, enabling active operation
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.116Z [INFO] core: post-unseal setup starting
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.138Z [INFO] core: loaded wrapping token key
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.138Z [INFO] core: successfully setup plugin runtime catalog
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.138Z [INFO] core: successfully setup plugin catalog: plugin-directory=“”
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.171Z [INFO] core: successfully mounted: type=system version=“v1.15.2+builtin.vault” path=sys/ namespace="ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=identity version=“v1.15.2+builtin.vault” path=identity/ namespace="ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=kv version=“v0.16.1+builtin” path=secret/ namespace="ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=cubbyhole version=“v1.15.2+builtin.vault” path=cubbyhole/ namespace="ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.294Z [INFO] core: successfully mounted: type=token version=“v1.15.2+builtin.vault” path=token/ namespace="ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.295Z [INFO] core: successfully mounted: type=kubernetes version=“v0.17.1+builtin” path=kubernetes/ namespace=“ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.295Z [INFO] core: successfully mounted: type=userpass version=“v1.15.2+builtin.vault” path=userpass/ namespace=“ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.311Z [INFO] rollback: Starting the rollback manager with 256 workers
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.311Z [INFO] rollback: starting rollback manager
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.312Z [INFO] core: restoring leases
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.394Z [INFO] expiration: lease restore complete
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.395Z [INFO] identity: entities restored
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.418Z [INFO] identity: groups restored
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.445Z [INFO] core: starting raft active node
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.445Z [INFO] storage.raft: starting autopilot: config=”&{false 0 10s 24h0m0s 1000 0 10s false redundancy_zone upgrade_version}” reconcile_interval=0s
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.470Z [INFO] core: usage gauge collection is disabled
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.718Z [INFO] core: post-unseal setup complete
2024-01-19T22:04:23+05:00 2024-01-19T17:04:23.948Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=
2024-01-19T22:14:19+05:00 2024-01-19T17:14:19.065Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=