running the following:
vault v1.9.3
csi-secrets-store/driver:v1.2.4
vault-csi-provider:1.2.0
I’m unable to use the agent injector for scaling and resources issues (not economical over a few 1000 services to roll out)
Shortly before going live I noticed that a ton of leases get created and the ttl’s weren’t being respected to the point where vault curled up and sealed itself.
I manually deleted the 200k leases from the database and saw them quickly building up again.
on the CSIDriver secrets-store.csi.k8s.io -o yaml I updated:
tokenRequests:
- audience: vault
expirationSeconds: 3600
then on the secrets-store-csi-driver I set the rotationPollInterval: 30m
I also set the ttl on my secrets engine to 14 days.
Things seem to be working for the time being but not very confident in the setup.
Here is what I understand is happening
csi-driver to pod reconciliation:
The csi-driver rotates the secrets it stores locally at the rotationPollInterval meaning that pods restart with fresh secrets at 30 mins
csi-driver to vault reconciliation:
the CSI pod will query vault and update it’s secrets cache at expirationSeconds and get fresh secrets if need be else it keeps caches
Whats unclear is where the ttl for the login token (kubernetes/auth/lease) is set as some leases are 60 minutes and some 20 minutes (the auth-method default ttl is 60min)
The secrets engine ttl
The secrets engine sets the credentials ttl to live for 14 days
Any idea if I will get caught without a valid secret inside my pod?
Also will this setup scale for a few thousand pods pods?