I am struggling with a pretty simple task of monitoring Vault seal/unseal status.
We are using Prometheus to scrape Vault metrics by polling the /sys/metrics endpoint and receive back the vault.core.unsealed boolean value which looks like what we need. It returns back a 1, when vault is unsealed.
But whenever the Vault is sealed, the metrics API endpoint returns a 503 error “Vault is sealed”, which I guess is expected if Vault can’t read the storage.
But what is the point of vault.core.unsealed metric if it will never return value 0 to indicate that the Vault is sealed since no API calls can be made to a sealed Vault instance?
I am aware of an API /sys/seal-status, but it is not capable of returning back Prometheus type metrics, so to use it we would need to write some exporter to do so.
How do you guys monitor Vault seal status? Do you use /sys/seal-status, or some custom logic when the metrics API is returning specific errors?
Thanks! That makes the metrics endpoint responsive when Vault is sealed. But now it’s only returning very few gauges, not including the vault.core.unsealed one. I am only getting back these in the response:
Managed to find why it wasn’t returning the unseal metric.
Turns out since Vault uses go-metrics library, once a metric has seen no activity, it will disappear after prometheus_retention_time.
Since my Vault instance had been sealed for a longer period of time than my configured prometheus_retention_time of 30 seconds, the metric was not being returned since it had been no change in it.
Had to just increase the prometheus_retention_time setting to keep the metric in Vault memory for longer.