Consul memory use doesn't make sense. (1.9.10)

Before I open a ticket I thought i would try here and see if this is ‘normal’. I have a consul instance that has about 100k ‘nodes’ that are attached to one of 4 services. These nodes are not actual machines/containers, so health checks on them are disabled. Looking at my data dir i only see around ~300MB of storage used at any point. I am not using the key/value store for anything, so it is empty.

Consul is using almost 30GB of memory without doing anything (watching consul monitor debug). Failing to see how 100k of node’s turns into 30GB of memory usage. Is this normal?

Adding some more info here… but this is a quick test i whipped up to create a bunch of nodes attached to 3 different services

https://gist.githubusercontent.com/jordant/84559dc2a252ebd7d31a10e79a719a35/raw/31ed58f40acba1531cb4557d62baa3b12a43c676/gistfile1.txt

After all the nodes/services have been created consul goes from about 20MB (baseline before running this script) to around 600MB of memory usage. Still a long shot from my production instance.

Strange thing i’m noticing is that even even after the script removes all of the nodes, consul memory usae is still fairly high ~200MB and it no longer has any nodes or services. This is still the case an hour after there have been no more nodes/services.

Seems like its not purging everything related to these nodes that have been deleted from memory or something? Is there some kind of garbage collection that might not be happening or is configurable?

Hey @jordant

Welcome to the hashicorp community! That’s definitely a concerning amount of memory used… Could you provide me with the outputs of consul debug ?

Specifically, I’m interested to see your raft snapshot configuration. With the right configuration snapshots could cause the levels of memory usage seen in your test ( not sure about your production environment at the moment ) .