Vault expire_num_leases very high on a specific timeframe

erlisb · May 30, 2022, 8:36am

Dear HashiCorp Team,

We are experiencing a strange situation with our Vault Cluster deployed in an Redhat Openshift Cluster, where we face a high number of leases (to be expired) at a specific timeframe (see attachment).
This particular load appears every day at the same time window. I don’t know if this is some kind of internal process within the Vault Cluster, but I didn’t found any clear explanation.

Vault Cluster properties :

License OSS
Version 1.10.0
Mode HA Cluster (3 nodes)
Openshift Cluster 4.9

Any kind of hint would be helpful.

Thanks

maxb · May 30, 2022, 5:24pm

You should look to identify which kind of leases these are.

But before you start, make sure you’ve understood exactly what that metric means - it’s not leases due for expiration, it’s the total number of leases being tracked by the expiration manager - i.e. all leases.

Given you have such an impressive step-change, perhaps the Vault server log or audit log has useful clues?

If not, some other interesting metrics to look at could be:

vault_token_creation - i.e. rate of lease creation by authentications, which is broken down by several useful labels
vault_secret_lease_creation - i.e. rate of lease creation by access to leased secrets - also with useful labels

aram · May 30, 2022, 5:33pm

It wouldn’t be internal process, there is a process or team that’s doing something they shouldn’t be. I’d suggest turning on your audit device and tracking the auth that is generating the high number of leases.

glisav · April 26, 2023, 12:14pm

Hello Team,

We are having the same situation in our Vault Cluster.
Currently we are using the same stack, Vault OSS deployed on OpenShift Cluster, but a newer version of Vault server: v1.12.3.
The spike from the high number of leases that are about to expire happens everyday on a specific time:

We have been observing other metrics that could be related with this case, such as:

vault_token_creation - increased on the same timeframe
vault_expire_revoke - decreased on the same timeframe
vault_expire_revoke_by_token - decreased on the same timeframe
vault_expire_lease_expiration - increased on the same timeframe

I wasn’t able to find the reason that is causing that.
Any idea would help.

Thanks!

maxb · April 26, 2023, 12:24pm

This seems to imply you have lots of things, which are all having their leases expire around the same time - which is quite possible, if they all created them around the same time, and they had the same TTLs.

implies that these things are probably freshly logging in to Vault to get replacement tokens.

IIRC the vault_token_creation has useful labels in Prometheus identifying the kinds of tokens being created, so you should have a look at what those labels are, for the timeseries that are experiencing a substantial increase.

glisav · April 26, 2023, 1:22pm

Hello @maxb

Thank you for your fast response.

From my analyses, I can see that the highest number of leases is generated from a K8s auth method:

But the K8s Role attached to it, is configured to have a TTL of 1 minute.
As far as I understand, this leases should be deleted after their expiration, right?
So, I don’t know why this happens once a day, while it should be based on expiration of each lease.

maxb · April 26, 2023, 2:12pm

This would seem to suggest you have a huge amount of Vault logins coming from your Kubernetes pods at this time - for example, this could be the case if a lot of scheduled nightly cron jobs are all triggering and logging in to Vault.

Vault’s audit logging support can write logs of all requests and responses - you might use this to get detailed information on the contents of this burst of requests.

Topic		Replies	Views
"lease count exceeds warning lease threshold" keeps popping up Vault	2	1182	November 16, 2021
About the metrics vault expire leases by expiration Vault	2	395	July 15, 2022
How to debug leases? Vault	5	414	March 21, 2022
Vault expiration revoked lease logs Vault	3	3602	March 30, 2022
Huge number of leases while using /auth/kubernetes Vault k8s	15	2683	March 23, 2022

Vault expire_num_leases very high on a specific timeframe

Related topics