Understand memory usage of a cluster running integrated storage

Hello,

I’m running a 3-nodes cluster with integrated storage, and I’m seeing high memory usage (2Gi) on the active node while standby nodes use much less resources (400Mi).

I’m trying to understand where this is coming from, and see if this could cause problems later on. Is this related to the actual usage of the cluster? Is the active node loading things in memory? AFAIK audit logs and raft database are stored on disk.

I looked into the documentation but didn’t find clear answers on Vault memory usage.
Does anyone have an idea - or can point me to some piece of documentation?

Thanks in advance,
Frederic

It is expected that the active node uses much more memory than standby nodes.

The active node loads a lot of information about what users, groups, secret engines, auth methods exist in your Vault into memory, as part of performing active operations.

Meanwhile the standby nodes have shut down large parts of the Vault code, and are mostly just sitting there quiescent, participating in Raft, and forwarding any user requests received to the active node.

Hi @maxb, thanks for your reply.
Is there a way to estimate what the memory usage of a cluster would be?
Based on the database size, number of clients, …?
This would help setting up Kubernetes memory limits for the deployment.
BTW I noticed memory usage remains at 99% of the limits; am I just lucky, or is Vault able to see what amount of memory it can use?

No. Just have to try it and see.

I’m not sure about that - a lot depends on exactly what you’re measuring - it would be reasonable for OS caches to take up otherwise unused memory.

I looked a bit deeper into the memory metrics and this is indeed due to memory cache. FYI this can be found in /sys/fs/cgroup/memory/memory.stat.
I forced the active node to step down and its cache didn’t decrease, instead it started increasing on the new active node. So I guess this is OK and can be left as-is.

1 Like