I am trying to play around with consul caching behaviour. (use_cache: true)
If I do DNS querying for non-existent services (1 million of them), I can swamp the agent/server cache.
What can I do to prevent consul from this ddos behaviour ? (aside from disabling the cache altogether ?)
I see that there is an issue already created in consul for the same … hope I am not missing anything.
Hi @amit-handda
I’ll follow up on this and get back to you in the next few days. Since this is an older issue, I’d like to collect some data
Can you please provide some information around what your Consul deployment looks like? How many Servers/Agents are you running?
How are you doing this testing?
What use case are you testing for?
Thanks again for a great question
Hi (again),
consul server cluster: 3 node cluster(ec2 nodes c5.2xlarge)
single consul agent on another node : c4.2XL
note: use_cache is true for the cluster and agent.
benchmarking tool: an adaptation of https://github.com/rakyll/hey
testing methodology
DDOS the agent’s dns port with service resolution requests (@1200 concurrency).
total service resolutions invoked are around 1 million . most of these services are not registered in the cluster/agent.
I am testing for the performance and resiliency of the consul setup.
Issue is (as detailed in original post), agent/servers consume all of the server RAM -> go out of memory. Hence, it would be nice to fix the indicated consul issue (#4968) to control the agent cache.