Hello,
We are encountering a strange problem with our vault cluster in which vault does not go into active mode and throws some TLS errors and I’m at a bit of a loss on what is going on. This cluster uses AWS dynamodb as a backend. Our config is as follows:
==> Vault server configuration:
AWS KMS KeyID: <KMS_ID>
AWS KMS Region: us-east-1
HA Storage: consul
Seal Type: awskms
Api Address: https://<address>:8200
Cgo: disabled
Cluster Address: https://<address>:8201
Listener 1: tcp (addr: "172.21.32.10:8200", cluster address: "172.21.32.10:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
Log Level: debug
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: dynamodb
Version: Vault v1.3.3
The TLS error we are getting is as follows:
2021-04-28T15:25:26.043-0400 [INFO] proxy environment: http_proxy= https_proxy= no_proxy=
2021-04-28T15:25:26.155-0400 [DEBUG] config path set: path=vault
2021-04-28T15:25:26.155-0400 [WARN] appending trailing forward slash to path
2021-04-28T15:25:26.155-0400 [DEBUG] config disable_registration set: disable_registration=false
2021-04-28T15:25:26.155-0400 [DEBUG] config service set: service=vault
2021-04-28T15:25:26.155-0400 [DEBUG] config service_tags set: service_tags=
2021-04-28T15:25:26.155-0400 [DEBUG] config service_address set: service_address=
2021-04-28T15:25:26.155-0400 [DEBUG] config address set: address=127.0.0.1:8500
2021-04-28T15:25:26.155-0400 [DEBUG] storage.cache: creating LRU cache: size=0
2021-04-28T15:25:26.156-0400 [DEBUG] cluster listener addresses synthesized: cluster_addresses=[172.21.32.10:8201]
2021-04-28T15:25:26.162-0400 [INFO] core: stored unseal keys supported, attempting fetch
2021-04-28T15:25:26.194-0400 [DEBUG] core: unseal key supplied
2021-04-28T15:25:26.204-0400 [DEBUG] core: starting cluster listeners
2021-04-28T15:25:26.204-0400 [INFO] core.cluster-listener: starting listener: listener_address=172.21.32.10:8201
2021-04-28T15:25:26.204-0400 [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=172.21.32.10:8201
2021-04-28T15:25:26.204-0400 [INFO] core: entering standby mode
2021-04-28T15:25:26.207-0400 [INFO] core: vault is unsealed
2021-04-28T15:25:26.207-0400 [INFO] core: unsealed with stored keys: stored_keys_used=1
2021-04-28T15:25:26.740-0400 [DEBUG] core: parsing information for new active node: active_cluster_addr=https://:8201 active_redirect_addr=https://:8200
2021-04-28T15:25:26.740-0400 [DEBUG] core: refreshing forwarding connection
2021-04-28T15:25:26.740-0400 [DEBUG] core: clearing forwarding clients
2021-04-28T15:25:26.740-0400 [DEBUG] core: done clearing forwarding clients
2021-04-28T15:25:26.740-0400 [DEBUG] core: done refreshing forwarding connection
2021-04-28T15:25:26.740-0400 [DEBUG] core: creating rpc dialer: host=fw-c9349236-9c5d-5c26-13c1-1a1cce4bd848
2021-04-28T15:25:26.745-0400 [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2021-04-28T15:25:26.745-0400 [DEBUG] core.cluster-listener: error handshaking cluster connection: error=“unsupported protocol”
2021-04-28T15:25:26.745-0400 [ERROR] core: error during forwarded RPC request: error=“rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing remote error: tls: internal error””
2021-04-28T15:25:26.745-0400 [ERROR] core: forward request error: error=“error during forwarding RPC request”
2021-04-28T15:25:26.746-0400 [DEBUG] core: forwarding: error sending echo request to active node: error=“rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing remote error: tls: internal error””
2021-04-28T15:25:26.819-0400 [ERROR] core: error during forwarded RPC request: error=“rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing remote error: tls: internal error””
consul client log shows this:
2021/04/28 14:54:59 [INFO] agent: (LAN) joined: 5 Err:
2021/04/28 14:54:59 [INFO] agent: Join LAN completed. Synced with 5 initial agents
2021/04/28 14:55:01 [INFO] agent: Synced node info
2021/04/28 14:55:20 [INFO] agent: Synced service “vault:8200”
2021/04/28 14:55:20 [INFO] agent: Synced check “vault:8200:vault-sealed-check”
2021/04/28 14:55:20 [INFO] agent: Synced check “vault:8200:vault-sealed-check”
2021/04/28 14:56:33 [INFO] agent: Deregistered service “vault:8200”
2021/04/28 14:56:34 [INFO] agent: Deregistered check “vault:8200:vault-sealed-check”
2021/04/28 14:59:34 [ERR] http: Request PUT /v1/agent/check/pass/vault:8200:vault-sealed-check?note=Vault+Unsealed, error: CheckID “vault:8200:vault-sealed-check” does not have associated TTL from=127.0.0.1:57098
2021/04/28 14:59:34 [INFO] agent: Synced service “vault:8200”
2021/04/28 14:59:34 [INFO] agent: Synced check “vault:8200:vault-sealed-check”
2021/04/28 14:59:35 [INFO] agent: Synced check “vault:8200:vault-sealed-check”
The SSL certs we use appear to be okay(we used vault to generate them).
I know our version is a bit older being v1.3.3 but we use this version in other environments with no issue.
Has anybody come across this before?