Huge number of leases while using /auth/kubernetes

We have vault deployed on k8s cluster and integrated with several other k8s clusters using CSI driver to provide secrets to workloads there - it all works ok but…:
list of leases for every cluster is several thousand pages long… e.g.

which corresponds to the huge size of vault.db:

/vault/data $ ls -lh
total 8G
drwxrws---    2 root     vault      16.0K Nov 15 19:42 lost+found
-rw-rw----    1 vault    vault         36 Nov 15 19:42 node-id
drwxrwsr-x    3 vault    vault       4.0K Nov 15 19:42 raft
-rw-------    1 vault    vault       8.0G Mar 16 16:33 vault.db

and pods memory consumption:

(⎈ |gke-devops-prod:vault) ~  k top pods
NAME                                  CPU(cores)   MEMORY(bytes)
in-cluster-vault-0                    13m          79Mi
in-cluster-vault-1                    28m          369Mi
in-cluster-vault-2                    63m          9557Mi

Is it expected? We don’t have like a big number of workloads there (maybe 20-30 with vault integration) - vault is being hit with no more than 1.5-2 RPS (not 2k rps - just 2 rps)

I believe it also translates to the fact that vault takes forever to restart / roll a new version - like 90-120 min per single pod and while this is happening other pod resource consumption goes to 11GB of data …
What could be a misconfiguration on our side?

Hey!

What Kubernetes version are you using? And what authentication approach for Kubernetes auth? And what Vault version?

k8s 1.22 - “v1.22.6-gke.300” to be precise.
Vault v1.9.3
Kubernetes - Auth Methods | Vault by HashiCorp ← this auth method

Alright and out of the available approaches to auth (listed here)?

I’m using terraform to configure vault - current config is:

resource "vault_auth_backend" "k8s-apps-dev" {
  type = "kubernetes"
  path = "k8s-apps-dev"
  tune {
    default_lease_ttl = "1h"
    max_lease_ttl     = "1h"
  }
}

resource "vault_kubernetes_auth_backend_config" "k8s-apps-dev-config" {
  backend            = vault_auth_backend.k8s-apps-dev.path
  kubernetes_host    = "https://35.195.xxx.xxx"
  issuer             = "https://container.googleapis.com/v1/projects/gcp-project-name/locations/europe-xxxxx/clusters/cluster-name"
  kubernetes_ca_cert = file("auth_data/apps-dev-cluster/ca_cert.crt")
  token_reviewer_jwt = file("auth_data/apps-dev-cluster/token.jwt")
}

tune section was added today - as a test if this helps - previously it was 0 - which defaulted to 768h of lease I believe

Could you check the amount of entities that are bound to the auth method? If it’s more than expected, please have a look at the alias/service account UID used.
I will have a look at your lease settings as this seems to possibly be the issue.

no more than 10-20 entities for all clusters in total - for auth/Kubernetes method

any ideas @RemcoBuddelmeijer ?
I’ve reconfigured k8s auth method to have a max lease of 60s and I’m waiting to see if ~500 000 old leases will end… either way there is something not quite ok with this native k8s integration - why these leases were never revoked?

@lukpep Allow me some time to dive into it. Had it planned for today and will let you know more shortly!

1 Like

@lukpep I have had a look at all the components that are being used, this means:

  1. Kubernetes Auth
  2. CSI provider

As of now nothing odd has happened to me. Whenever I use the CSI provider on your exact Vault version, it all goes smoothly and only a singular lease is created.
However, this was not the case whenever a secret could not be read. Rather than 1 it would create multiple leases. One for each time it retried, but this was a finite amount and leases expired. Perhaps checking if all the secrets are read at first try within timeout through audit logging and debug logging would bring more to light?

One thing that I did want to ask you is to check your secrets store CSI driver version. Could you share this perhaps?

Other than that I didn’t know anything specific about your setup of Vault. This makes it very hard to judge what goes wrong. 2 RPS could still mean some misconfiguration that might not be caught in your specific metric.
For this I would have to know more. And since this is sensitive information I can understand if it’s out of reach. For this it’s either up to you to reach out to HashiCorp themselves or to share it either way. (If you were to share this information, please do make sure it’s cleared by whoever is in charge and disclose it securely. I recommend against sharing it in general, as it’s your own personal setup of Vault!)

Sorry if this wasn’t what you wanted to hear. CSI Driver seems to function as expected on the latest (Helm) version with no unordinary test cases.

@RemcoBuddelmeijer thanks for Your time :slight_smile:
Regarding CSI driver - I’m using 1.0.0 from Secrets Store CSI Driver Helm Chart Repository - and I can see that the newest one is 1.1.1
What I was able to see while looking at nginx ingress controller logs (behind which is my vault)

ingress-nginx-controller-6c9594575f-2hll2 controller 10.97.128.107 - - [20/Mar/2022:11:51:28 +0000] "POST /v1/auth/k8s-apps-prod/login HTTP/2.0" 200 710 "-" "Go-http-client/2.0" 1363 0.346 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 710 0.346 200 e0ba5daba19168ad062fd845d6f16e91
ingress-nginx-controller-6c9594575f-2hll2 controller 10.97.128.107 - - [20/Mar/2022:11:51:28 +0000] "GET /v1/app-secrets/data/some-random-app/prod HTTP/2.0" 200 2938 "-" "Go-http-client/2.0" 71 0.002 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 2950 0.001 200 d1d06f156dd92657f35b554a8bd4a675
--
ingress-nginx-controller-6c9594575f-2hll2 controller 192.168.130.1 - - [20/Mar/2022:11:51:33 +0000] "POST /v1/auth/k8s-apps-prod/login HTTP/2.0" 200 710 "-" "Go-http-client/2.0" 1363 0.494 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 710 0.494 200 b7efe65cb03bc79fc0218dac8a32d1be
ingress-nginx-controller-6c9594575f-2hll2 controller 192.168.130.1 - - [20/Mar/2022:11:51:33 +0000] "GET /v1/app-secrets/data/some-random-app/prod HTTP/2.0" 200 2938 "-" "Go-http-client/2.0" 71 0.002 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 2950 0.002 200 0a2383fd543179ef56ae37d335a70ceb
--
ingress-nginx-controller-6c9594575f-2hll2 controller 10.97.128.107 - - [20/Mar/2022:11:53:28 +0000] "POST /v1/auth/k8s-apps-prod/login HTTP/2.0" 200 710 "-" "Go-http-client/2.0" 1363 0.250 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 710 0.250 200 bd7b83ccaee69f75714bd781d9c5cf4d
ingress-nginx-controller-6c9594575f-2hll2 controller 10.97.128.107 - - [20/Mar/2022:11:53:28 +0000] "GET /v1/app-secrets/data/some-random-app/prod HTTP/2.0" 200 2938 "-" "Go-http-client/2.0" 72 0.002 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 2950 0.001 200 71a811fa7f1c7c6d3213fcfdada18fc9
--
ingress-nginx-controller-6c9594575f-2hll2 controller 192.168.130.1 - - [20/Mar/2022:11:53:33 +0000] "POST /v1/auth/k8s-apps-prod/login HTTP/2.0" 200 710 "-" "Go-http-client/2.0" 1363 0.447 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 710 0.446 200 f1b278a2f151da20ff84bc4d1451ded0
ingress-nginx-controller-6c9594575f-2hll2 controller 192.168.130.1 - - [20/Mar/2022:11:53:33 +0000] "GET /v1/app-secrets/data/some-random-app/prod HTTP/2.0" 200 2938 "-" "Go-http-client/2.0" 72 0.002 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 2950 0.001 200 15d15e99cfc6afb015550dfcc0089d09
 --
ingress-nginx-controller-6c9594575f-2hll2 controller 10.97.128.58 - - [20/Mar/2022:11:55:28 +0000] "POST /v1/auth/k8s-apps-prod/login HTTP/2.0" 200 710 "-" "Go-http-client/2.0" 1363 0.251 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 710 0.251 200 48fbb65745254e20ed1498e0d6bffbfc
ingress-nginx-controller-6c9594575f-2hll2 controller 10.97.128.58 - - [20/Mar/2022:11:55:28 +0000] "GET /v1/app-secrets/data/some-random-app/prod HTTP/2.0" 200 2938 "-" "Go-http-client/2.0" 71 0.002 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 2950 0.001 200 557a728936f334c514955b72dabb09a7
--
ingress-nginx-controller-6c9594575f-2hll2 controller 10.97.128.107 - - [20/Mar/2022:11:55:33 +0000] "POST /v1/auth/k8s-apps-prod/login HTTP/2.0" 200 710 "-" "Go-http-client/2.0" 1363 0.588 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 710 0.587 200 28e02dea3d629bf28c4ca522f5a82136
ingress-nginx-controller-6c9594575f-2hll2 controller 10.97.128.107 - - [20/Mar/2022:11:55:33 +0000] "GET /v1/app-secrets/data/some-random-app/prod HTTP/2.0" 200 2938 "-" "Go-http-client/2.0" 71 0.001 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 2950 0.002 200 0e734c6f5a1464a8d072fad83bfb1ddd
--
ingress-nginx-controller-6c9594575f-2hll2 controller 10.97.128.58 - - [20/Mar/2022:11:57:27 +0000] "POST /v1/auth/k8s-apps-prod/login HTTP/2.0" 200 710 "-" "Go-http-client/2.0" 1363 0.260 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 710 0.259 200 ad31a9aeef26a669b77086ee61a19328
ingress-nginx-controller-6c9594575f-2hll2 controller 10.97.128.58 - - [20/Mar/2022:11:57:27 +0000] "GET /v1/app-secrets/data/some-random-app/prod HTTP/2.0" 200 2938 "-" "Go-http-client/2.0" 71 0.002 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 2950 0.002 200 85ed2c01b37c850a92b2c1ba7279be91
--
ingress-nginx-controller-6c9594575f-2hll2 controller 192.168.130.1 - - [20/Mar/2022:11:57:33 +0000] "POST /v1/auth/k8s-apps-prod/login HTTP/2.0" 200 710 "-" "Go-http-client/2.0" 1363 0.521 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 710 0.521 200 1af4e2487e9fd2d9eceddd5b2f422b7b
ingress-nginx-controller-6c9594575f-2hll2 controller 192.168.130.1 - - [20/Mar/2022:11:57:33 +0000] "GET /v1/app-secrets/data/some-random-app/prod HTTP/2.0" 200 2938 "-" "Go-http-client/2.0" 70 0.002 [vault-in-cluster-vault-active-8200] [] 192.168.128.49:8200 2950 0.002 200 9b308624d5351ea9e28f24deb21d9b10

No secret data leaked here - I’ve checked :wink:
Every 2 minutes we have login POST and GET secrets… for some reason, this is repeated in the next 5-6 seconds. Every login creates a new token and new lease I assume?
When it comes to this specific app config it’s using secret provider object:

apiVersion: secrets-store.csi.x-k8s.io/v1alpha1
kind: SecretProviderClass

configured with 5 keys - all coming from a sing secret path -

/v1/app-secrets/data/some-random-app/prod

What troubles me is:

  • why this pattern is repeated twice every 2 minutes? It seems like single login and get secret should be enough right?
  • while I understand why we need to check secrets every 2 minutes (secret rotation) it looks like without some kind of token caching this solution will scale poorly - we are talking about ~ 45k lease objects per month / per application (per SecretProviderClass to be exact - not sure how it will behave if we have multiple secret paths - and not only keys - under same SecretProviderClass object) - 20 apps per cluster x 4 clusters (nothing extraordinary I believe) and we have close to 4 million lease objects per month - which in our case (extrapolating from near 2 million we have) translates to vault instance with > 20 GB memory used and startup / restart times counted in hours.

So my question is - should we shorten TTL on these leases? from 1 month to 1 minute to lets say somehow keep the number of it under control? Or maybe constant tokens rewokes every 1 minute will kill the CPU?

ok - i know why this login / get pattern gets repeated after 5s - this is done for every single pod in a replica set separately… and this particular service has 2 replicas. When I scaled it to 3 I have 3x login and get secret. Not so optimal I must say :wink:

Looks to me like you might be better off using the Vault Agent rather than the CSI driver for the time being. I will have a look at the Vault CSI Provider and see what can be done to improve upon this.
The issue here really seems to be in the authentication part rather than any type of secret caching. Caching will improve the provider a lot, but the leases are a huge deal as they are being tracked in memory. A lease shouldn’t have to be created every 5s, not even every 2m.

Would having a look at the Vault Agent be something you’d be interested in?

I was not a big fan of agent (last time I’ve checked it) since it required extra pod per secret aware workload. Will validate it once again.
What is “broken” in the current CSI driver implementation in my opinion:

  • not making use of the lease TTL - instead rotation-poll-interval from here is what defines the number of leases created per hour / month etc. Token created via single login should be cached and reused for the TTL it was created with
  • CSI driver should not make requests (and logins) per pod in the replica set - it is counter-intuitive that deployment of 100 pods enforces 100 logins and 100 GETs for the same secret once every rotation-poll-interval (2 minutes by default) - it also creates inconsistency in the secret itself since this sync interval is bound to pod lifetime (counter starts when pod is created) and therefore can be a time period when the secret was refreshed in some portion of pods and not yet in other - and in max, it could last to rotation-poll-interval once again - which is definitely not desired.

If having a sidecar for every single on of your deployments isn’t an option, then that sadly enough leaves that.

I 100% agree, and I think this should be fixed in a way that it survives updates of the Secret Store CSI Driver. Right now I see a lack of API usage and rather see some than just API objects.
From looking at the GitHub Issues it does seem like they are aware of this, and have a plan in the future to work towards this. Either way this should be fixed at least momentarily.

I think this is where it starts becoming a bit hard. V1.0.0 has just been released and with that the first stable release of the CSI Driver itself. A lot of things couldn’t exactly have been put into place as there either wasn’t enough time or wasn’t sure what might have been introduced and what not.
Time will fix these issues as I am sure that Vault team is aware that making 100 requests per 100 pods isn’t do-able.

How about we start off by creating a (number of) issue(s) on the GitHub repository and link this thread? I can do this after having done some more research into the Vault CSI Provider itself.

created Vault CSI provider not making use of received token TTL · Issue #150 · hashicorp/vault-csi-provider · GitHub
and Login and secret sync pattern per pod in replica set · Issue #151 · hashicorp/vault-csi-provider · GitHub

1 Like