Error "context canceled" without the reason I'm able to find

I have this setup, where two pods with vault are running on EKS. Last week I performed upgrade from Kubernetes 1.21 to 1.22 and one of my vault pods started to show:

2022-10-21T15:10:03.724Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Canceled desc = context canceled"
2022-10-21T15:10:03.724Z [ERROR] core: forward request error: error="error during forwarding RPC request"

in the logs. It happens always at the same time - 10 minutes past the full hour. I have a cronjob running every 10 minutes, so it’s like the obvious place to check - but this cronjob’s pod reports:

time="2022-10-21T15:10:03Z" level=info msg="received new Vault token" addr= app=vault-env path=kubernetes role=my-role
time="2022-10-21T15:10:03Z" level=info msg="initial Vault token arrived" app=vault-env

so it looks like everything is ok on it.

Also, in the log from the second (‘working’) vault’s pod, I see:

022-10-21T15:10:03.475Z [INFO]  expiration: revoked lease: lease_id=<some lease id>

I’ve searched issues on Github, but I wasn’t able to find anything with such generic error message as ‘context cancelled’ - every one of them had a ‘valid’ reason.

Any ideas, what could/should I check? Or maybe a general explanation what does ‘context cancelled’ mean?

TIA

2 Likes

Generally, ‘context cancelled’ means ‘an operation timed out and was abandoned’.

Your post is quite interesting to me, as we’re also seeing that pair of errors, a lot, on our very busy Vault cluster at work - and we’re also running Kubernetes 1.22 - however sadly we’ve not been able to make any progress in understanding the cause either.

But still - everything works and you don’t see any problems with your setup? Just this in logs?

  • Did it work on 1.21 without those errors on the same config?
  • Which version of Vault? I have 1.10.5 (installed by banzaicloud/bank-vaults).

I’ve updated my test instance to 1.12.0 and these errors stopped showing. Can you also try it and confirm?

We only get these errors in prod, which we’re unlikely to move to 1.12 this year.

I have been debugging this for so long and even went down to the vault code and i finally found someone with the same error. Did upgrading to 1.12 resolve the problem?

I see the same error with version 1.23. Did you find a way to fix it?

Disclaimer: Different scenarios and it may not be related
I was receiving a “context canceled” error is received in go when there is a timeout from client side/client cancels request before its completed. Can you check if there is some timeout configured for those requests and if that can be increased?

Im using the 1.14.1 version and I get the same error on standby nodes