Monitor Vault for general 500 errors

I had a vault cluster in a bad state the other day. The vault status reported that everything was fine, but when I would do a request, I would get 500 errors. Fortunately, the cluster is working again. Unfortunately I did not capture any of the errors or do more troubleshooting, because it was a production system.

Main question: I am wondering if there is a status endpoint that I could monitor for general 500 errors?

I wonder if /sys/health would be the correct place…

The list of “default status codes” do not include the general 500 code: https://www.vaultproject.io/api/system/health.html

Side question: If there are “default status codes”, does that mean that custom codes can be enabled?

There are quite a few issues on their github around 500 errors. We encountered some 500 errors when we upgraded to 1.2. I take it you upgraded to 1.2 as well? I reverted back to the previous version we were using (1.1.0) until they are worked out.

1 Like

We were playing with 1.2 on this cluster and different cluster. But the cluster I had problems with seemed to start having the problems after I enabled auditing to a socket. The cluster servers are running in docker nodes with minimal OS images. So this action kicked into a bad state. Perhaps there was a correlation with the 1.2 version as well, I do not recall at this point. Wish I had been able to debug it.