Intermitent auth.handler: error authenticating: error="context deadline exceeded"

I am sometimes getting the error on the vault agent init:

[INFO] auth.handler: authenticating

[1 minute later] [ERROR] auth.handler: error authenticating: error=“context deadline exceeded”

When connecting to vault. This can carry on retrying for over 10 minutes. It always resolves it self though.

On the vault pod side I am getting:

[INFO] http: TLS handshake error from 10.242.3.100:33430: EOF

I am guessing because of the 1 minute timeout.

Any idea how to debug further? - The vault pods do not seem to be exhausting resources, s3 seems fine…

I am also using eks.

Is this the only “error” message you get on the Vault pods?

around that time yes!

But I do also see this error sometimes:

[ERROR] core: error during forwarded RPC request: error=“rpc error: code = Canceled desc = context canceled”

Can you post the anonymized version of the Vault agent annotations in the Pod spec? (most of it should be safe to post here anyway)

vault.hashicorp.com/agent-init-first: 'true'
    vault.hashicorp.com/agent-inject: 'true'
    vault.hashicorp.com/agent-inject-default-template: json
    vault.hashicorp.com/agent-inject-secret-**foo**: >-
      some/secret/**foo**
    vault.hashicorp.com/agent-inject-status: injected
    vault.hashicorp.com/agent-inject-token: 'true'
    vault.hashicorp.com/agent-pre-populate-only: 'true'
    vault.hashicorp.com/agent-requests-cpu: 10m
    vault.hashicorp.com/agent-run-as-user: '1000'
    vault.hashicorp.com/ca-cert: /vault/tls/ca.crt
    vault.hashicorp.com/namespace: ''
    vault.hashicorp.com/role: **some-role**
    vault.hashicorp.com/tls-secret: **some-secret-ca-crt**

there are lots of vault.hashicorp.com/agent-inject-secret-foo: >- some/secret/foo

I put ** around everything that was redacted

10.242.3.100:33430

Did you confirm this was indeed the Pod’s IP? Just trying to eliminate issues that are not related.

Can you share your Vault Kubernetes auth backend config? Have you followed the docs?

Which of these methods are you using?

This sounds a lot like an issue we had recently. It had absolutely nothing to do with the authentication config. The backend becomes unavailable because of DynamoDB throttling.

You could check that.

Yes. I did it is the same.

I use s3 not dynamodb - and it looks fine.

FYI this only happens during startup of the pod!

Maybe check AWS Instance Metadata Timeouts (though EKS nodes should already have max hop limit set to 2)

1 Like

How did you deploy Vault? Using the official Helm chart?

1 Like

yes we do using the official helm chart

yeah unfortunately we have already optimised this too!

+-----------------------------------+---------------------------------+
| Field                             | Value                           |
+-----------------------------------+---------------------------------+
| Alias name                        | serviceaccount_uid              |
| Source                            |                                 |
| Audience                          |                                 |
| Bound service account names       | REDACTED                        |
| Bound service account namespaces  | *                               |
| Tokens                            |                                 |
| Generated Token's Bound CIDRs     |                                 |
| Generated Token's Explicit Maximum TTL | 0                           |
| Generated Token's Maximum TTL     | 0                               |
| Do Not Attach 'default' Policy To Generated Tokens | false         |
| Maximum Uses of Generated Tokens  | 0                               |
| Generated Token's Period          | 0                               |
| Generated Token's Policies        | default,tf-eks-dev-eu-west-2-schedule |
| Generated Token's Initial TTL     | 86400                           |
| Generated Token's Type            | default                         |
+-----------------------------------+---------------------------------+

Think you need to use pre-formatted text or just paste the redacted version of the vault read auth/kubernetes/config output

1 Like

edited sorry - is that easier to understadnd?

1 Like

So I was trying to establish which of the following you used for the Kubernetes auth:

Option All tokens are short-lived Can revoke tokens early Other considerations
Use local token as reviewer JWT Yes Yes Requires Vault (1.9.3+) to be deployed on the Kubernetes cluster
Use client JWT as reviewer JWT Yes Yes Operational overhead
Use long-lived token as reviewer JWT No Yes
Use JWT auth instead Yes No

it is Use long-lived token as reviewer JWT