DNS Performance

We’re currently testing out consul for a load balancing type setup that we’re planning via dns. We have consul nodes running on some decently sized servers (bare metal) right now for testing (128gb of RAM, 32 cpu cores). The best performance I’ve been able to squeeze out of each node so far is 32,000 queries per second, which basically works out to 1000 queries per second for each cpu core. While I’m stress testing the box with dns queries the overall cpu usage, load average, memory, and network throughput remain quite low, it’s definitely not using the hardware to it’s full potential. This is with caching enabled at 10 seconds within consul.

I’m wondering if anyone has any suggestion for config changes that could be made to improve the dns performance of each of these nodes, or have I just hit some sort of programmatical limitation? What’s the best dns queries per second rate that people have been able to achieve from a single consul node? Is 1000 queries per cpu core basically some kind of hard limit that we can expect?

Thanks.

I ended up putting unbound in front of consul on each node as a dns cache server with the following unbound.conf options after making consul listen to dns on port 553 and I’m getting up to 400k queries per second now per node.

server:
    interface: 0.0.0.0
    access-control: 0.0.0.0/0 allow
    access-control: 127.0.0.0/8 allow
    do-not-query-localhost: no
    domain-insecure: "ourdomain.com"
    cache-max-ttl: 1
forward-zone:
    name: "."
    forward-addr: 127.0.0.1@553