Hundreds of thousands of auth tokens, why?

In my setup I found I have more than a million Vault auth token files over the course of a few weeks, in sys/expire/id/auth/gcp/login/.

I understand this is because of either (1) the token isn’t released after it is finished, or (2) the TTL isn’t set correctly, or (3) they’re root tokens that don’t expire.

These auth tokens pile up very fast and seem to be from automated activity in my setup.

How can I decrypt example auth tokens to determine what they are for?

You can discount root tokens, since they wouldn’t be produced from auth/gcp/login. The path within the Vault data store you have mentioned, implies that these tokens are from logins to the auth method mounted at auth/gcp/.

Every time anything logs in to Vault, a token is created.

The default kind of token is a fairly short opaque identifier, which Vault needs to track in its storage. (The files that you see.)

They will stay there until their TTL expires, or something explicitly calls an API operation to revoke them.

The built-in way to list tokens in Vault is

vault list auth/token/accessors

However there is a problem with this - the Vault API doesn’t support pagination, so it will try to return a single response containing an entry for each of those more than a million files - that might fail.

If you are able to get hold of token accessors from the above command, then you can get more information on each one, using

vault write auth/token/lookup-accessor accessor=XXXXXXXXXXXX

That’s tokens… but each stored token will also have an associated lease. Leases are Vaults combined expiry tracking mechanism, for all types of things - not just tokens - in Vault that can expire.

Leases have some URLs in the Vault API you can use to investigate as well:

vault list sys/leases/lookup/auth/gcp/login
vault write sys/leases/lookup lease_id=auth/gcp/login/......

The lease lookup gives you the remaining TTL but doesn’t tell you anything more about what created it, however.


OK, that’s useful background, but what can you do to fix the issue? There are 3 choices I can think of:

  1. Change everything logging in via auth/gcp/login to explicitly revoke tokens when finished with them.

  2. Configure a much smaller TTL on your GCP auth role - see https://www.vaultproject.io/api-docs/auth/gcp#create-role, but bear in mind that this means applications logging in will lose access sooner.

  3. Configure your GCP auth role to return batch tokens. Unlike the default sort of token (service tokens), batch tokens are not stored in the Vault storage. Instead they’re a cryptographically signed token that Vault can verify without needing to track each one. But, switching to batch tokens isn’t entirely transparent to users! Read more about the technical details on https://www.vaultproject.io/docs/concepts/tokens

Great answer, so clear and helpful. I’ve tracked down the piece of code that was making a token, and fixed it.

One additional piece of information: to clean up the mess, I used vault lease revoke -prefix auth/gcp/login. This slowly started cleaning up, and after a few hundred tokens were removed, I got an error context deadline exceeded in the CLI. But the token cleanup continued and much later I saw storage get back to where it should be.

There were also hundreds of thousands of files under sys/token/accessor and sys/token/id, and those were cleaned up too.