Hi,
We recently faced an issue where our cluster which was running fine suddenly went into failed state and was not able to communicate with each other for raft storage , Normally we see these lines in healthy cluster also but it gets connected automatically . In this case it kept repeating the last error line which indicates no tls config found for ALPN.
It got resolved as soon we restarted our vault service on all the three nodes of our cluster.
We saw that others are also facing this , we are currently using Vault v1.7.1 ?
Please let us know if this is a bug or something else . Let me know if i need to paste my configuration here ?
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: error during forwarded RPC request: error=“rpc error: code = Unavailable desc = transport is closing”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: forward request error: error=“error during forwarding RPC request”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: error during forwarded RPC request: error=“rpc error: code = Unavailable desc = transport is closing”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: forward request error: error=“error during forwarding RPC request”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: error during forwarded RPC request: error=“rpc error: code = Unavailable desc = transport is closing”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.722+0530 [ERROR] core: forward request error: error=“error during forwarding RPC request”
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.725+0530 [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
Feb 28 02:05:55 vault: 2023-02-28T02:05:55.734+0530 [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
Feb 28 02:05:56 vault: 2023-02-28T02:05:56.728+0530 [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
Feb 28 02:05:56 vault: 2023-02-28T02:05:56.738+0530 [WARN] core.cluster-listener: no TLS config found fo
maxb
February 28, 2023, 1:59pm
2
I think you have the cause and effect backwards - the ALPN warnings are simply a symptom of the nodes disagreeing with each other about leadership, not the cause.
Effectively, one node is trying to forward requests to another, and the destination node is saying “You think I’m the leader but no, I’m not!”.
I think it is unlikely that it will be possible to identify a root cause based on the limited logging available - but, I would mention that Vault 1.7.x is an unreasonably old version to still be running these days, particularly when running Integrated Storage, for which a great many bugfixes have been made over recent releases.
Thank you, we will plan to upgrade but the issue was too random and it might take us some time as we would need to plan this upgrade all the way to production.
Also, it seems that users are still facing this in 1.12.0 as seen in the below thread: -
opened 06:57PM - 13 Nov 20 UTC
closed 11:41PM - 20 Nov 20 UTC
**I'm posting this issue with resolution to help others that encounter the issue… with Vault Enterprise.**
After adding a new plugin, startup started taking 5 minutes longer than usual. For us, this caused crashlooping since our health probes would fail before successful startup. While startup is hanging, Vault will print many messages as standbys unsuccessfully try to connect to the active node for request forwarding:
```
[WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
```
```
2020-11-12T22:25:24.968Z [INFO] core: restoring leases
2020-11-12T22:25:24.969Z [INFO] rollback: starting rollback manager
2020-11-12T22:25:25.185Z [INFO] identity: entities restored
2020-11-12T22:25:25.193Z [INFO] identity: groups restored
2020-11-12T22:25:25.196Z [INFO] mfa: configurations restored
2020-11-12T22:25:26.392Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:26.540Z [INFO] expiration: lease restore complete
2020-11-12T22:25:27.395Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:28.749Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:31.438Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:35.617Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:42.012Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:49.023Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:50.024Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:51.451Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:53.645Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:54.334Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:25:57.027Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:26:04.662Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:26:10.012Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:26:16.083Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:26:33.850Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:26:35.462Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:27:02.398Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:27:23.505Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:27:37.575Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:28:40.981Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:28:54.321Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:30:11.302Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:30:53.960Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-11-12T22:32:07.018Z [INFO] core: stopping replication
2020-11-12T22:32:07.018Z [INFO] core: closed sync connection
2020-11-12T22:32:07.018Z [INFO] core: replication stopped
2020-11-12T22:32:07.018Z [INFO] core: setting up replication
2020-11-12T22:32:07.022Z [INFO] core: replicated cluster information not found or disabled, not activating client
2020-11-12T22:32:07.022Z [INFO] core: replication setup finished
```
This is due to an enterprise bug fixed in v1.5.4:
https://github.com/hashicorp/vault/blob/master/CHANGELOG.md#154
>core (enterprise): Fix hang when cluster-wide plugin reload cleanup is slow on unseal
>core (enterprise): Fix an error in cluster-wide plugin reload cleanup following such a reload
Enterprise includes a function implementation that gets called post-unseal:
https://github.com/hashicorp/vault/blob/master/vault/core.go#L99
If that function hangs it stops the handler for the ALPN from being registered. When standby nodes attempt to connect for request forwarding, this resutls in the "no TLS config found for ALPN" errors that got spammed in the logs.
**Upgrade to v1.5.4 or later to correct this issue.**
We are facing same issue for our vault cluster. We are using Vault v1.15.2 with raft storage in HA with three nodes. Do we need to open a issue with Hashicorp against this? Even after restarting the vault cluster and allowing standby node to elect a leader node. This issue still persists.
Leader node logs:
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.079Z [INFO] storage.raft: election won: term=32 tally=2
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.079Z [INFO] storage.raft: entering leader state: leader=“Node at vault-secrets-server-2.vault-secrets-server-internal:8201 [Leader]”
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.079Z [INFO] storage.raft: added peer, starting replication: peer=eecd350a-f2a7-403f-9702-619ba8cebe40
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.079Z [INFO] storage.raft: added peer, starting replication: peer=9b285315-469d-d3c1-be5f-9ba1a41f6760
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.081Z [INFO] storage.raft: pipelining replication: peer=“{Voter 9b285315-469d-d3c1-be5f-9ba1a41f6760 vault-secrets-server-1.vault-secrets-server-internal:8201}”
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.081Z [INFO] storage.raft: pipelining replication: peer=“{Voter eecd350a-f2a7-403f-9702-619ba8cebe40 vault-secrets-server-0.vault-secrets-server-internal:8201}”
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.095Z [INFO] core: acquired lock, enabling active operation
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.116Z [INFO] core: post-unseal setup starting
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.138Z [INFO] core: loaded wrapping token key
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.138Z [INFO] core: successfully setup plugin runtime catalog
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.138Z [INFO] core: successfully setup plugin catalog: plugin-directory=“”
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.171Z [INFO] core: successfully mounted: type=system version=“v1.15.2+builtin.vault” path=sys/ namespace="ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=identity version=“v1.15.2+builtin.vault” path=identity/ namespace="ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=kv version=“v0.16.1+builtin” path=secret/ namespace="ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=cubbyhole version=“v1.15.2+builtin.vault” path=cubbyhole/ namespace="ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.294Z [INFO] core: successfully mounted: type=token version=“v1.15.2+builtin.vault” path=token/ namespace="ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.295Z [INFO] core: successfully mounted: type=kubernetes version=“v0.17.1+builtin” path=kubernetes/ namespace=“ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.295Z [INFO] core: successfully mounted: type=userpass version=“v1.15.2+builtin.vault” path=userpass/ namespace=“ID: root. Path: "
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.311Z [INFO] rollback: Starting the rollback manager with 256 workers
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.311Z [INFO] rollback: starting rollback manager
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.312Z [INFO] core: restoring leases
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.394Z [INFO] expiration: lease restore complete
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.395Z [INFO] identity: entities restored
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.418Z [INFO] identity: groups restored
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.445Z [INFO] core: starting raft active node
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.445Z [INFO] storage.raft: starting autopilot: config=”&{false 0 10s 24h0m0s 1000 0 10s false redundancy_zone upgrade_version}” reconcile_interval=0s
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.470Z [INFO] core: usage gauge collection is disabled
2024-01-19T22:01:48+05:00 2024-01-19T17:01:48.718Z [INFO] core: post-unseal setup complete
2024-01-19T22:04:23+05:00 2024-01-19T17:04:23.948Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=
2024-01-19T22:14:19+05:00 2024-01-19T17:14:19.065Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=