Team - I have a vault cluster that was launched in GCP with 2 instances of Vault. All of a sudden this morning i have noticed one of the vault instances is not up which resulted in to the sealed state. TBH not sure what went wrong as this is the first time i am noticing something like this, need your expertise to help me with troubleshooting. Sharing some of the logs from it which can point to the issue accurately.
==> Vault server configuration:
Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
Log Level: info
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: etcd (HA available)
Version: Vault v1.4.3
==> Vault server started! Log data will stream in below:
2021-01-12T23:03:51.721Z [INFO] proxy environment: http_proxy= https_proxy= no_proxy=
2021-01-12T23:03:52.452Z [INFO] core: stored unseal keys supported, attempting fetch
2021-01-12T23:03:52.508Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=[::]:8201
2021-01-12T23:03:52.508Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
2021-01-12T23:03:52.508Z [INFO] core: vault is unsealed
2021-01-12T23:03:52.508Z [INFO] core: entering standby mode
2021-01-12T23:03:52.524Z [INFO] core: unsealed with stored keys: stored_keys_used=1
{âlevelâ:âwarnâ,âtsâ:â2021-01-12T23:55:42.537Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = Unavailable desc = etcdserver: leader changedâ}
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:11:24.362Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = DeadlineExceeded desc = context deadline exceededâ}
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:11:26.173Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:1,âerrorâ:ârpc error: code = DeadlineExceeded desc = context deadline exceededâ}
2021-01-18T23:11:26.173Z [ERROR] core: key rotation periodic upgrade check failed: error=âcontext deadline exceededâ
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:12:12.359Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = DeadlineExceeded desc = context deadline exceededâ}
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:12:14.565Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = Unavailable desc = etcdserver: leader changedâ}
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:12:14.565Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = Unavailable desc = etcdserver: leader changedâ}
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:12:19.819Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = DeadlineExceeded desc = context deadline exceededâ}
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:12:21.334Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = DeadlineExceeded desc = context deadline exceededâ}
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:12:27.121Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = DeadlineExceeded desc = context deadline exceededâ}
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:15:09.378Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = DeadlineExceeded desc = context deadline exceededâ}
2021-01-18T23:15:14.493Z [INFO] core: acquired lock, enabling active operation
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:15:18.351Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = DeadlineExceeded desc = context deadline exceededâ}
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:15:19.590Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = DeadlineExceeded desc = context deadline exceededâ}
2021-01-18T23:15:19.590Z [ERROR] core: error performing key upgrades: error=âerror reloading master key: error reloading master key: failed to read master key path: context deadline exceededâ
2021-01-18T23:15:19.590Z [INFO] core: marked as sealed
{âlevelâ:âwarnâ,âtsâ:â2021-01-18T23:15:20.701Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = Canceled desc = context canceledâ}
2021-01-18T23:15:20.701Z [INFO] core: stopping cluster listeners
2021-01-18T23:15:20.701Z [INFO] core.cluster-listener: forwarding rpc listeners stopped
2021-01-18T23:15:21.012Z [INFO] core.cluster-listener: rpc listeners successfully shut down
2021-01-18T23:15:21.012Z [INFO] core: cluster listeners successfully shut down
2021-01-18T23:15:23.541Z [INFO] core: vault is sealed
2021-01-19T03:39:14.260Z [INFO] http: TLS handshake error from 10.245.135.49:12662: remote error: tls: unknown certificate
{âlevelâ:âwarnâ,âtsâ:â2021-01-19T03:39:15.079Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = Unauthenticated desc = etcdserver: invalid auth tokenâ}
{âlevelâ:âwarnâ,âtsâ:â2021-01-19T03:44:25.520Zâ,âcallerâ:âclientv3/retry_interceptor.go:61â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-788e8e41-18b4-4554-b0be-8dfcda3cd540/vault-etcd.sedvip-dev.svc.cluster.local:2379â,âattemptâ:0,âerrorâ:ârpc error: code = Unauthenticated desc = etcdserver: invalid auth tokenâ}
Also all my ETCD instances show some common errors below very repeatedly.
2021-01-19 01:01:38.437900 I | auth: deleting token TQIeTjyjmSBnEkKy.1167591 for user root
Some more information from the K8 events of ETCD instance.
Events:
Type Reason Age From Message
Warning Unhealthy 93s (x26251 over 11d) kubelet (combined from similar events): Readiness probe failed: ==> Bash debug is on
==> [DEBUG] Probing etcd cluster
==> [DEBUG] Probe command: âetcdctl --user root:<> --cert=/opt/bitnami/etcd/certs/client/cert.pem --key=/opt/bitnami/etcd/certs/client/key.pem --cacert=/opt/bitnami/etcd/certs/client/ca.crt endpoint healthâ
{âlevelâ:âwarnâ,âtsâ:â2021-01-20T06:44:01.695Zâ,âcallerâ:âclientv3/retry_interceptor.go:62â,âmsgâ:âretrying of unary invoker failedâ,âtargetâ:âendpoint://client-dd5b9d5f-a762-40bf-8fe6-796d8d879199/127.0.0.1:2379â,âattemptâ:0,âerrorâ:ârpc error: code = DeadlineExceeded desc = context deadline exceededâ}
127.0.0.1:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
Will look forward to your advice.