Consul benchmarking : EOF errors

I am getting errors while benchmarking consul PUT/GET requests.
[268] Get http://10.99.0.241:6555/v1/kv/bench: EOF
I am using hey to benchmark consul.
consul is deployed as a single instance server on 8 core, 16GB aws ec2 instance (c4.2xlarge)
the network/os params are tuned (ulimit -n 100000, somaxconn -> 100000).
benchmarking command line: ./heyl -n 10000 -c 100 -m PUT -d 1234 http://<ip>:6555/v1/kv/bench
n -> total operations
c -> concurrency

error stats at various n,c are as follows:

Concurrency | Ops | Errors (apprx.)
----------- | ---- | --------------------
100 | 10000 | no errors
200 | 10000 | PUT(250 EOF), GET(13 EOF)
1000 | 10000 | PUT(1300 EOF), GET(465 EOF)

any ideas on possible cause ?

1 Like

bumping again … any pointers ?

for someone who lands here, do following to tune the cluster:
tune limits for benchmarking so that you dont get connection resets at >100 concurrency

    "http_max_conns_per_client" : 1000,
    "rpc_max_conns_per_client" : 1000
  },

Also, increase socket receive buffer size otherwise benchmark client ll receive

i/o timeouts

net.core.rmem_default=851968
net.core.wmem_default=851968

Hi @amit-handda!

Thanks for posting here, and for the update. I apologize that no one got back to you.
I am curious about your use case here. What traffic patterns are you trying to emulate?
I noticed that you mentioned consul was only running on a single C4.2XL instance. Can I ask why you’re running only a single node?

Thanks for the good question, and the follow up!

Thanks @jsosulska for replying. I was wondering if ppl visit the forum or not.

as already noted in my original post, I am benchmarking consul agent.
I was running consul on a single node (single server) because I wanted to observe the numbers on a simpler configuration, before setting up a 3 node server cluster + agents configurations …

Thanks,

Hi @amit-handda

Thanks for the fast response! It helps a lot. :smiley: Two quick questions;

  1. What version of Consul are you benchmarking?
  2. Have you seen our server performance guide?

A word of caution for your benchmarking numbers - in our guides, we do post warnings that a single instance deployment of Consul isn’t the same as running a full deployment. When a Consul agent is ran in -dev mode, the binary is handling both server and agent functionalities. Here’s a brief example;

  1. You can’t replicate raft behaviors in -dev mode. Raft is often the bottleneck in multi-server setups since all writes have to be handled by a majority subject to network latency etc.
  2. -dev mode is especially bad since it is in-memory only and doesn’t even bother writing the raft log to disk which means disk IO isn’t reflected - the write bottleneck in a real install.

For some additional considerations, please see our Consul internals.

Looking forward to hearing back from you.