Hello! We have 200 hosts and 3 thousands application on them.
There’re troubles with master node performance. It has high CPU load and it going bad.
On metrics dashboard we can see 30k RPC requests per second, also there is high consul.raft.commitTime - average 500ms with picks until 5 seconds.
- Is it normal load for such cluster? May be we just need to improve hardware.
- It there any way to understand on what exactly our servers spend most resources?
May be some logs or another metrics
Consul: 1.8.5 on servers, on slave nodes mixed 1.6.1 and 1.8.5
OS: Ubuntu 18.04.5 LTS