Why is vault using so much memory

I’m using 3 vault instances with spinning disks,

I have 5 clusters with a bunch of microservices retrieving secrets from vault using banzaicloud’s vault-secrets-webhook.

I am noticing that vault pods will have a linear increase in memory over time until being killed.

I have problems that vault token lookups take more than 10 seconds at some moment.

I noticed also that because of spinning disks I assume there is a high number of raftCommitTime occassionaly takes 1 second to write which I assume is much to slow.

This is one thing I will try to fix by moving to SSDs instead… ( currently using CEPH for persistent volume ) )

But why Vault is ever increasing in RAM is unclear…

Running vault 1.11.3

It seems like a problem. Does anybody have a suggestion on why this could be happening?

I highly doubt there is a memory leak in vault.

I read something about: prevent memory leak when using control group factors in a policy by hghaf099 · Pull Request #17532 · hashicorp/vault · GitHub

but doubt whether i am affected.

Will try to update to 1.11.5 regardless…

we moved to SSDs everywhere to deal with the storage IOPS.

Raft + spinning disks didn’t seem quick enough

Why there is a linear increase in memory is still unclear though :(*

It is very difficult to speculate on causes without intimate knowledge of your specific Vault workloads.

Some ideas come to mind though:

  • How big is your Raft database on disk? (Could the growth be more and more of the database ending up in RAM over time?)

  • HashiCorp have a tool called hcdiag for collecting information about a Vault instance. Even if you won’t be engaging HashiCorp commercial support, examining the output yourself might help.

1 Like

Hi @maxb thank you for your reply.

This quite trivial I think.

The raft storage is 160MB

the raftcommittime dropped with a factor ~1000 after moving to SSD.

The Linear increase of ram is quite confusing IMO

We use Banzaicloud’s bank-vault operator and we let it configure vault to use an emptyDir to store the vault_audit.log

As I understand it, by default an emptyDir volume should be temporarily stored on disk.

However it seems to be part of the memory, which is quite confusing to me.

I truncated the vault_audit.log with and noticed the memory drop. So it seems the emptyDir is in memory.

  - emptyDir: {}
    name: vault-auditlogs
  - emptyDir:
      medium: Memory
      sizeLimit: 1Mi
    name: vault-config

so the vault-config is stored in memory, whilst according to this definition the vault-auditlogs should not.

Quote: " default emptyDir volumes are stored on whatever medium that backs the node such as disk, SSD, or network storage, depending on your environment. If you set the emptyDir.medium field to "Memory", Kubernetes mounts a tmpfs (RAM-backed filesystem) for you instead. While tmpfs is very fast, be aware that unlike disks, tmpfs is cleared on node reboot and any files you write count against your container’s memory limit."

That sounds like you’re not actually using the audit log at all, if you’re just writing it to a temporary directory that gets wiped on pod termination? So you could just turn it off?

There is fluentd container in the pod that mounts this directory as well and listens to the vault-audit.log file to ship it.

It’s based on: bank-vaults/vault_controller.go at dcdd0c5eb81648259e8f034024a84b61c5ecd7ab · banzaicloud/bank-vaults · GitHub