What is kv_secrets_by_mountpoint, seeing high amounts in Vault Debug

I noticed that one of our vault cluster are seeing high amounts of vault.metrics.collection compared to other clusters.

Upon further examination via Vault Debug it was mainly due to the gauge kv_secrets_by_mountpoint

  • Is there an explanation onto what is the use of the kv_secrets_by_mountpoint gauge?
  • What is the meaning of the kv_secrets_by_mountpoint gauge?
  • Does this affect the service quality of Vault due to its high values?
      {
        "Count": 1,
        "Labels": {
          "cluster": "apne2",
          "gauge": "leases_by_expiration"
        },
        "Max": 0.020255999639630318,
        "Mean": 0.020255999639630318,
        "Min": 0.020255999639630318,
        "Name": "vault.metrics.collection",
        "Rate": 0.0020255999639630317,
        "Stddev": 0,
        "Sum": 0.020255999639630318
      },

...
      {
        "Count": 1,
        "Labels": {
          "cluster": "apne2",
          "gauge": "kv_secrets_by_mountpoint"
        },
        "Max": 2738.61572265625,
        "Mean": 2738.61572265625,
        "Min": 2738.61572265625,
        "Name": "vault.metrics.collection",
        "Rate": 273.861572265625,
        "Stddev": 0,
        "Sum": 2738.61572265625
      },
...

      {
        "Count": 1,
        "Labels": {
          "cluster": "apne2",
          "gauge": "token_by_policy"
        },
        "Max": 0.014418999664485455,
        "Mean": 0.014418999664485455,
        "Min": 0.014418999664485455,
        "Name": "vault.metrics.collection",
        "Rate": 0.0014418999664485455,
        "Stddev": 0,
        "Sum": 0.014418999664485455
      },

I’m not confident and would need to dig deeper but I believe this is an actual count of number of secrets stored in KV in all namespaces. So 3000 secrets for a large organization isn’t out of the question, I think our production instance is well over 20k.

As far as what does this mean for service quality, it depends on your storage solution and the number of requests. By itself the number doesn’t mean anything. Really the metric you need to keep an eye on is the number of requests. Careful about correlating those together, a higher number of secrets and a higher number of requests do not actually mean anything as you can have a single misbehaving app requesting the same exact secret be the 90th percentile of your usage.

What I do is look at the number of requests and if it starts to go high look for the higher source IP requests (or blocks of IPs) find the namespace and group that’s doing it and get them to either fix their code OR start using the agent more to cache these. If you have a lot of requests from a whole area, a PR cluster maybe in order to reduce the load and to horizontally expand.

1 Like