Hello,
Unsure if this is a bug or just a misconfiguration/usage on our side, so opening the thread here rather than on github (as I couldn’t find any other reference to this issue actually…), if this need to be opened on github, let me know.
I just spawned a new cluster in our K8S environement (we use some consul 1.8.3 there without issues), using the consul-helm chart 0.32.1.
However, if I check the logs, I see randomly these kind of errors:
2021-07-01T10:22:28.622Z [WARN] agent: grpc: addrConn.createTransport failed to connect to {10.42.2.56:8300 0 consul-cluster-server-0.consul <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.42.2.56:8300: operation was canceled". Reconnecting...
2021-07-01T10:22:28.622Z [WARN] agent: grpc: addrConn.createTransport failed to connect to {10.42.1.149:8300 0 consul-cluster-server-2.consul <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.42.1.149:8300: operation was canceled". Reconnecting...
2021-07-01T10:22:28.622Z [WARN] agent: grpc: addrConn.createTransport failed to connect to {10.42.2.56:8300 0 consul-cluster-server-0.consul <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.42.2.56:8300: operation was canceled". Reconnecting...
2021-07-01T10:27:33.356Z [WARN] agent: grpc: addrConn.createTransport failed to connect to {10.42.1.149:8300 0 consul-cluster-server-2.consul <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.42.1.149:8300: operation was canceled". Reconnecting...
What confuses me, is that even on Debug, I do not have any additional messages… And I do see logs saying there are connection error for the node itself. The log above was taken from the node “consul-cluster-server-2.consul”.
As I am currently testing this new version in order to do the jump from 1.8.3 to 1.10, I am wondering if this come from our setup or from something else, that we might have missed.
In the chart we:
- Set metric to true
- Disabled dns
- Disabled client
- Enabled the management of ACL
- Increase the requests/limit resources.
As we have no issue at all on the other application/version of consul, I am wondering where these errors are coming from…