Consul cache OOM

I am trying to play around with consul caching behaviour. (use_cache: true)
If I do DNS querying for non-existent services (1 million of them), I can swamp the agent/server cache.

What can I do to prevent consul from this ddos behaviour ? (aside from disabling the cache altogether ?)

I see that there is an issue already created in consul for the same … hope I am not missing anything.

Hi @amit-handda

I’ll follow up on this and get back to you in the next few days. Since this is an older issue, I’d like to collect some data :slight_smile:
Can you please provide some information around what your Consul deployment looks like? How many Servers/Agents are you running?
How are you doing this testing?
What use case are you testing for?

Thanks again for a great question :slight_smile:

Hi (again),

consul server cluster: 3 node cluster(ec2 nodes c5.2xlarge)
single consul agent on another node : c4.2XL
note: use_cache is true for the cluster and agent.

benchmarking tool: an adaptation of https://github.com/rakyll/hey

testing methodology

DDOS the agent’s dns port with service resolution requests (@1200 concurrency).
total service resolutions invoked are around 1 million . most of these services are not registered in the cluster/agent.

I am testing for the performance and resiliency of the consul setup.

Issue is (as detailed in original post), agent/servers consume all of the server RAM -> go out of memory. Hence, it would be nice to fix the indicated consul issue (#4968) to control the agent cache.