Vault HA cluster monitoring with Prometheus

Hello!

We have a HA Vault cluster with raft storage, which was deployed in K8s with Helm. We configured telemetry (with prometheus_retention_time=24h) for Prometheus scraping, but it seems like missing the telemetry core.* metrics on standby nodes.

I’ve found a similar issue that was opened in 2020 (!) year.

Are we having any workaround solutions now? How can we monitor Vault cluster health with Prometheus?

Vault’s implementation of Prometheus metrics has quite a few bugs or suboptimal design issues, unfortunately.

Expect to need to set up a separate custom exporter, or work around quite a lot of issues in PromQL.

I was interested in fixing this in the past, but I no longer work at an employer that uses Vault, and HashiCorp’s rate of accepting community contributions is too slow for it to be enjoyable as a hobby.