I have 5 clusters with a bunch of microservices retrieving secrets from vault using banzaicloud’s vault-secrets-webhook.
I am noticing that vault pods will have a linear increase in memory over time until being killed.
I have problems that vault token lookups take more than 10 seconds at some moment.
I noticed also that because of spinning disks I assume there is a high number of raftCommitTime occassionaly takes 1 second to write which I assume is much to slow.
This is one thing I will try to fix by moving to SSDs instead… ( currently using CEPH for persistent volume ) )
But why Vault is ever increasing in RAM is unclear…
Running vault 1.11.3
It seems like a problem. Does anybody have a suggestion on why this could be happening?
It is very difficult to speculate on causes without intimate knowledge of your specific Vault workloads.
Some ideas come to mind though:
How big is your Raft database on disk? (Could the growth be more and more of the database ending up in RAM over time?)
HashiCorp have a tool called hcdiag for collecting information about a Vault instance. Even if you won’t be engaging HashiCorp commercial support, examining the output yourself might help.
so the vault-config is stored in memory, whilst according to this definition the vault-auditlogs should not.
Quote: " default emptyDir volumes are stored on whatever medium that backs the node such as disk, SSD, or network storage, depending on your environment. If you set the emptyDir.medium field to "Memory", Kubernetes mounts a tmpfs (RAM-backed filesystem) for you instead. While tmpfs is very fast, be aware that unlike disks, tmpfs is cleared on node reboot and any files you write count against your container’s memory limit."
That sounds like you’re not actually using the audit log at all, if you’re just writing it to a temporary directory that gets wiped on pod termination? So you could just turn it off?