I noticed an abnormal performance degradation of hashicorp vault - pki engine, when certificate requests were made in parallel. I need to know,
i. Whether this is expected ? (No publicly available SLA stat for HashiCorp vault)
If not,
i. what is causing the sever slowdown of concurrent certificate requests
ii. what are the suggestions to improve performance of the vault
I used following script in 16 parallel sessions to request certificates from pki engine
To setup pki engine, i used the guideline mentioned in PKI - Secrets Engines - HTTP API | Vault by HashiCorp (I can provide the script i used to setup pki engine if you think this could be a configuration issue). I used raft as the storage.
Used C5 instances. Redhat 7.7. Yes, ALB with 3 nodes (HA setup with raft).
To narrow down the issue, tried in a physical (Intel Xeon E312xx, 24G memory, Redhat 7.4) machine (no ha setup, just a single node) as well. Performance degradation was the same.
Analyzed vault audit logs. Looks like response takes time. No indication of why it takes that much of time to respond when multiple requests receive at once.
This isn’t a valid comparison to your test. It appears they are doing KV reads? PKI is a much more CPU intensive and non-caching action.
A perf standby can service KV reads without going to the active node.
If you’re storing the certificates you’re issuing, having an infinite number of perf standbys won’t help as the request still has to forward to the active node for the write to occur.
Thanks for highlighting on KV reads. I missed it. Yes, based on my observations, 1000 concurrent certificate requests within 3 secs for a t2.small was a surprise as well. My load is far less than that. Issuing around 20 concurrent requests within 3 sec is enough.
I hope right sizing and tweaking on backend configurations can achieve it. As you’ve highlighted, i’ll check the possibility of not storing certificates in the backend as well.