gRPC Warning on Consul 1.10.0

Hello,

Unsure if this is a bug or just a misconfiguration/usage on our side, so opening the thread here rather than on github (as I couldn’t find any other reference to this issue actually…), if this need to be opened on github, let me know.

I just spawned a new cluster in our K8S environement (we use some consul 1.8.3 there without issues), using the consul-helm chart 0.32.1.

However, if I check the logs, I see randomly these kind of errors:

2021-07-01T10:22:28.622Z [WARN]  agent: grpc: addrConn.createTransport failed to connect to {10.42.2.56:8300 0 consul-cluster-server-0.consul <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.42.2.56:8300: operation was canceled". Reconnecting...
2021-07-01T10:22:28.622Z [WARN]  agent: grpc: addrConn.createTransport failed to connect to {10.42.1.149:8300 0 consul-cluster-server-2.consul <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.42.1.149:8300: operation was canceled". Reconnecting...
2021-07-01T10:22:28.622Z [WARN]  agent: grpc: addrConn.createTransport failed to connect to {10.42.2.56:8300 0 consul-cluster-server-0.consul <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.42.2.56:8300: operation was canceled". Reconnecting...
2021-07-01T10:27:33.356Z [WARN]  agent: grpc: addrConn.createTransport failed to connect to {10.42.1.149:8300 0 consul-cluster-server-2.consul <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.42.1.149:8300: operation was canceled". Reconnecting...

What confuses me, is that even on Debug, I do not have any additional messages… And I do see logs saying there are connection error for the node itself. The log above was taken from the node “consul-cluster-server-2.consul”.

As I am currently testing this new version in order to do the jump from 1.8.3 to 1.10, I am wondering if this come from our setup or from something else, that we might have missed.

In the chart we:

  • Set metric to true
  • Disabled dns
  • Disabled client
  • Enabled the management of ACL
  • Increase the requests/limit resources.

As we have no issue at all on the other application/version of consul, I am wondering where these errors are coming from…

2 Likes

@Lebvanih, thanks for posting this.

I’m also having the exact same issue with v1.10.0, apart from we’re running the servers outside of k8 on raw EC2. As you stated, we too have no issues with the cluster, but these errors make you think otherwise. Netcat tests show the ports are listening on all hosts.

Would be great to get an answer on this.

Same here,
after update to 1.10.0 the warning started in all servers.
They are in VM and Bare Metal, so the problem is not related only to K8S
tks for sharing

New 1.10.0 on New K8s Cluster results in [WARN]: · Issue #10603 · hashicorp/consul · GitHub is tracking this.

1 Like