I am sometimes getting the error on the vault agent init:
[INFO] auth.handler: authenticating
[1 minute later] [ERROR] auth.handler: error authenticating: error=“context deadline exceeded”
When connecting to vault. This can carry on retrying for over 10 minutes. It always resolves it self though.
On the vault pod side I am getting:
[INFO] http: TLS handshake error from 10.242.3.100:33430: EOF
I am guessing because of the 1 minute timeout.
Any idea how to debug further? - The vault pods do not seem to be exhausting resources, s3 seems fine…
I am also using eks.
Is this the only “error” message you get on the Vault pods?
But I do also see this error sometimes:
[ERROR] core: error during forwarded RPC request: error=“rpc error: code = Canceled desc = context canceled”
Can you post the anonymized version of the Vault agent annotations in the Pod spec? (most of it should be safe to post here anyway)
vault.hashicorp.com/agent-init-first: 'true'
vault.hashicorp.com/agent-inject: 'true'
vault.hashicorp.com/agent-inject-default-template: json
vault.hashicorp.com/agent-inject-secret-**foo**: >-
some/secret/**foo**
vault.hashicorp.com/agent-inject-status: injected
vault.hashicorp.com/agent-inject-token: 'true'
vault.hashicorp.com/agent-pre-populate-only: 'true'
vault.hashicorp.com/agent-requests-cpu: 10m
vault.hashicorp.com/agent-run-as-user: '1000'
vault.hashicorp.com/ca-cert: /vault/tls/ca.crt
vault.hashicorp.com/namespace: ''
vault.hashicorp.com/role: **some-role**
vault.hashicorp.com/tls-secret: **some-secret-ca-crt**
there are lots of vault.hashicorp.com/agent-inject-secret-foo: >- some/secret/foo
I put ** around everything that was redacted
10.242.3.100:33430
Did you confirm this was indeed the Pod’s IP? Just trying to eliminate issues that are not related.
Can you share your Vault Kubernetes auth backend config? Have you followed the docs?
Which of these methods are you using?
This sounds a lot like an issue we had recently. It had absolutely nothing to do with the authentication config. The backend becomes unavailable because of DynamoDB throttling.
You could check that.
Yes. I did it is the same.
I use s3 not dynamodb - and it looks fine.
FYI this only happens during startup of the pod!
Maybe check AWS Instance Metadata Timeouts (though EKS nodes should already have max hop limit set to 2)
1 Like
How did you deploy Vault? Using the official Helm chart?
1 Like
yes we do using the official helm chart
yeah unfortunately we have already optimised this too!
+-----------------------------------+---------------------------------+
| Field | Value |
+-----------------------------------+---------------------------------+
| Alias name | serviceaccount_uid |
| Source | |
| Audience | |
| Bound service account names | REDACTED |
| Bound service account namespaces | * |
| Tokens | |
| Generated Token's Bound CIDRs | |
| Generated Token's Explicit Maximum TTL | 0 |
| Generated Token's Maximum TTL | 0 |
| Do Not Attach 'default' Policy To Generated Tokens | false |
| Maximum Uses of Generated Tokens | 0 |
| Generated Token's Period | 0 |
| Generated Token's Policies | default,tf-eks-dev-eu-west-2-schedule |
| Generated Token's Initial TTL | 86400 |
| Generated Token's Type | default |
+-----------------------------------+---------------------------------+
Think you need to use pre-formatted text or just paste the redacted version of the vault read auth/kubernetes/config
output
1 Like
edited sorry - is that easier to understadnd?
1 Like
So I was trying to establish which of the following you used for the Kubernetes auth:
Option |
All tokens are short-lived |
Can revoke tokens early |
Other considerations |
Use local token as reviewer JWT |
Yes |
Yes |
Requires Vault (1.9.3+) to be deployed on the Kubernetes cluster |
Use client JWT as reviewer JWT |
Yes |
Yes |
Operational overhead |
Use long-lived token as reviewer JWT |
No |
Yes |
|
Use JWT auth instead |
Yes |
No |
|
it is Use long-lived token as reviewer JWT