We are having a use case where we want to have dynamic DB secrets in kubernetes pods loaded as ENV vars. We are able to achieve that and it isn’t practical as once our app loads secret from a ENV var it will be kept until pod is restarted etc.
So we will end up with app using an expired secret (db username and password) because max_ttl will end sooner or later. Is there any workaround for this?
We can also live with dynamic db secrets that will live forever (endless max_ttl) but what bothers me is that I don’t see a workaround to delete autogenerated usernames and passwords that are tied with these endless leases. We will end up with infinitive username and passwords if we do a proper cleanup. An revocation_statement would delete those secrets if secrets can expire.
Please help me!
I was recently working on a similar scenario with the Vault CSI Driver.
Please see this currently still open Pull Request to update the CSI Driver Learn Guide.
Notably you should install the driver with these options:
helm install csi secrets-store-csi-driver/secrets-store-csi-driver --set “syncSecret.enabled=true” --set “enableSecretRotation=true”
I haven’t tried it with a database dynamic secret yet, but it worked with changing a KVv2 value. It takes about 2min to sync, which is how often the CSI Driver checks with the enableSecretRotation field is set to true. The duration is configurable.
It’s possible to do something with the Agent Injector also, but it requires more advanced configuration.
@jbayer The hashicorp/tutorials repository is just a 404 for me, so I guess it’s private?
Oops, I didn’t realize that repo was private. Try this gist instead while we wait for the PR to get merged and update the site.
Thank you for your response, I appreciate.
I have exact same setup as you described, I think that’s the method that is giving you the most compared to other integrations. The Pull Request example will not work in real life scenario
The secret will that is loaded into app from environemnt variable will expire since the secrets are dynamic. App will end up with invalid username/password (based on TTL value) and will crash.
Do you know you have a problem in the real world or is this theoretical? If the max TTL of the database credential is longer than max TTL of a pod, then you would avoid the problem. k8s does not seem to have an officially GA version of max pod TTL, but it’s being worked on. If you pods are recycled more often than the credential max TTL, then you should be able to avoid the issue where the credential is expired in a running pod.
Can you describe the expected behavior of k8s and the application when a secret delivered as an environment variable is past it’s TTL? Environment variables are not easily dynamic. Once the process starts, it’s best to think of them as static. However, a secret in the CSI volume is updated if it changes. So if your application watches the volume file for changes, and if changes are detected, then taking appropriate actions to update the secret in the app, then the app would have the update and be able to proceed. This requires app awareness of logic, which is potentially complex and prone to mistakes.
A simpler solution in the future could be to have more advanced application app restart capabilities in a k8s controller that would deliver secrets to apps as env vars, and if there was a change in the secret (due to max TTL being reached), then use the Deployment or Stateful update strategy to transparently replace pods using the configured strategy. This is not something supported out of the box today with the Vault Injector or the CSI Driver.
Do you know you have a problem in the real world or is this theoretical?
We are still in PoC phase so it’s kind of both a bit
Can you describe the expected behavior of k8s and the application when a secret delivered as an environment variable is past it’s TTL?
Application will not longer able to read/write since loaded credentials expired and not usable.
Regarding the descheduler, It’s seems like a really nice tool. It would be kind of a workaround in my case and what a don’t like is that it can not preform a rolling upgrade on pods (which isn’t the purpose of descheduler though), but it can avoid my problem as you say.
I’ll test few cases during the weekend and will let you know.
Thanks for the tip.