We are seeing a racing condition with vault-agent-injector when we scale down our EKS worker nodes at night and then scale then up in the morning (for cost saving benefits of non-prod environments).
What happens is that some of our application pods that require ENV secrets from vault-sidecar are booting faster then the vault-agent-injector and vault pods. So the pod will come online, before the vault-agent-injector pod, and not receive a vault-agent-init or vault-agent pod. So then the application pod immeditaly goes into a unrecoverable crashbackloop with:
sh: 1: .: cannot open /vault/secrets/config: No such file
Since that file is sourced with ENV secrets when the pod comes up before running it’s entrypoint script. If we kill the pod and let it reboot the pod then is healthy.
So I know one solution would be to simply implement an
init container in our application helm charts to wait/ping
vault-agent-injector-svc.vault.svc until that service is resolving. However, I’m curious if this is a known issue and/or if there is a more “hashicorp” expected way of solving this problem.