This just took a long time to work. Leaving Vault alone overnight resulted in the desired cleanup of (most of) the AWS tokens and the UI was able to display the remainder again.
There is at least one open issue on GitHub asking for pagination in the API and I think that definitely needs to be added wherever Vault could be returning a lot of values.
Can you share the pagination GH issue? I think some folks would pile on for traction… there are issues with it, in terms of sorting, filtering, cursor-tracking of API results… but to start could help in situations like this.
Curious - how many leases and how long was their TTL?
The TTL was one month. I’m not sure how many leases we had but the Consul snapshot size went down from 494,614,448 to 1,700,848. It has started creeping up again (now at 5,207,067) so I’ve clearly still got some scripts that aren’t revoking the IAM auth token just before the script finishes.
As a further size reference point, I’ve just invalidated all leases and the Consul snapshot is down to 334,942. I hadn’t invalidated all of the leases over the Christmas break - I was going to let some of them expire naturally but, with the numbers climbing up again, I need to get Vault back to zero leases to make it easier to spot which scripts are not revoking their leases properly.
I’ve now got things more under control. I’m checking daily currently to make sure that none of the automation processes are still leaving AWS or approle tokens lying around. So far, they are being revoked when finished with, so that is keeping things much cleaner than previously.
I’ve gone through the documentation again and I cannot find anything that allows me to set a lifetime on the initial login token that is generated. Maybe there is something at a higher level than “AWS Auth” (i.e. a more global configuration) but, ultimately, I think I’ve solved my particular use-case by explicitly revoking the login tokens when I’m done with them.