"lease count exceeds warning lease threshold" keeps popping up

Good day,

I recently moved vault’s, backend storage to internal (raft) over the past weekend. The migration took… longer than we anticipated. I noticed that there were MANY keys being transferred over from paths such as:

sys/expire/id/auth/[insert auth name here]/login

Migration took roughly 13 hours for only 2G of data to migrate over :confused:

Anyways. That nightmare is behind me.

Today, i noticed these messages in the logs:

{"@level":"warn","@message":"lease count exceeds warning lease threshold","@module":"expiration","@timestamp":"2021-11-15T23:07:54.159140Z","have":461595,"threshold":256000}

{"@level":"warn","@message":"lease count exceeds warning lease threshold","@module":"expiration","@timestamp":"2021-11-15T23:08:55.158226Z","have":461595,"threshold":256000}

{"@level":"warn","@message":"lease count exceeds warning lease threshold","@module":"expiration","@timestamp":"2021-11-15T23:09:56.158666Z","have":461595,"threshold":256000}

{"@level":"warn","@message":"lease count exceeds warning lease threshold","@module":"expiration","@timestamp":"2021-11-15T23:10:57.158364Z","have":461595,"threshold":256000}

{"@level":"warn","@message":"lease count exceeds warning lease threshold","@module":"expiration","@timestamp":"2021-11-15T23:11:58.158158Z","have":461595,"threshold":256000}

{"@level":"warn","@message":"lease count exceeds warning lease threshold","@module":"expiration","@timestamp":"2021-11-15T23:12:59.158779Z","have":461595,"threshold":256000}

This goes on for quite a while.

At some points in the log, it DOES show that some of the releases are expiring:

{"@level":"info","@message":"revoked lease","@module":"expiration","@timestamp":"2021-11-15T22:28:13.641233Z","lease_id":"auth/aws-eu-west-1-qa/login/he482432a10c53a503abe6d5289354d8c8c6aad133dc45873efa12195d0b35b63"}
{"@level":"info","@message":"revoked lease","@module":"expiration","@timestamp":"2021-11-15T22:28:13.641389Z","lease_id":"auth/aws-eu-west-1-qa/login/hf0aa078ed3f74b50355a33154f7b5317beec44facc22533503a4c285431c04e3"}

So i guess it’s kinda working on its own?

I am unsure how to proceed with this, nor do I know how to troubleshoot. Is there a way I can forcibly revoke all these expired tokens?

This is a huge amount of data for Vault.

How many leases/tokens are created?
How many KVs?

You should setup telemetry to capture and identify the offending lease-creator here. 250k leases would be very, very odd to have in a single Vault cluster. What is your use case/# of expected servers/clients?

I don’t think it has anything to do with the migration. It looks like something was going on with AWS and Vault generated a whole lot of leases, either by timeout or by request.

Yes – You could expire all of those AWS leases (using a prefix) and let the system start over and keep an eye on the lease telemetry to see if it starts happening again – that’ll be easiest way and you’ll create less log entries. The other option is to just keep an eye on the lease count and let it do it’s thing. Depends on how busy the cluster is.