Hi,
We have a nomad cluster with 50+ clients and have noticed that sometimes the UI will stop responding and throw ERR_SOCKET_NOT_CONNECTED
or Connection reset by peer
errors. This tends to happen if there are a bunch of administrators/developers using the UI simultaneously and loading pages in quick succession.
We were able to reproduce this locally by spamming the UI and eventually it will stop responding. The logs don’t indicate anything useful.
$ curl http://127.0.0.1:4646
curl: (56) Recv failure: Connection reset by peer
Reproduction steps:
nomad agent -dev -bind 0.0.0.0 -log-level DEBUG
- Open browser to
http://127.0.0.1:4646
- spam F5 or load UI in A LOT of tabs
We noticed that by monitoring the number of Nomad sockets opened, it will get to a point where the count plateaus when the UI stops responding:
ss -apn | grep nomad | wc -l
Is there some sort of throttling mechanism that our team unintentionally trips when we’re performing too many calls to a server in quick succession?
Thanks,
VH