We recently updated consul to version 1.20.5, after the update, we started getting the following error:
agent.client: RPC failed to server: method=Health.ServiceNodes server=<correct_ip_address>:8300 error=“rpc error making call: rpc error getting client: failed to get conn: rpc error: lead thread didn’t get connection”
and
agent.server.rpc: RPC failed to server in DC: server=<correct_ip_address>:8300 datacenter= method=Health.ServiceNodes error=“rpc error getting client: failed to get conn: rpc error: lead thread didn’t get connection”
We updated to 1.20.6 as there was a bug in 1.20.5 that was slightly related, but the errors persist.
We are seeing this on all clients/servers across two datacenters, one with 75 members, the other with 22.
Basic debugging allows a connection to that IP from the clients/servers making the requests.
What else should we be looking at?