P.S. this works fine with consul connect built-in proxy but that doesn’t use grpc and is not supported for production use. Any help very much appreciated!
Is the grpc connection actually enabled on the network? No software firewalls/iptables/firewalld/ufw or hardware firewalls in the way?
Make sure you have a clear connection on the required ports. I’ve seen this before when grpc gets blocked along the way. Also make sure it’s not listening on loopback/127.0.0.1.
@carlos-lehmann would it be possible for you to post the configs you have used for servers and clients? Also can you post the beginning of the log when you start envoy?
Thank you!
I was just about to send you guys a lengthy reply but we just now finally solved it.
The problem was that we weren’t sure whether / how the GRCP Port needs to be made available Server to Client / Client to Server / Client to Client. Nevertheless we opened it in all directions. Next we noticed that the port is up & running on the Server but not on the Client. Even tho we configured the grpc port on the Agent of the Client:
"ports": {
"grpc": 8502
},
But given the lack of information we anyway weren’t sure it’s necessary or not. My colleague just now added following line in the client configuration:
Which he found due to an unrelated other issue with consul and he thought he could put it in there just to test. Et voilà the ports were coming up and which I’ve only noticed today. So I went in and tested again and the “upstream errors” were gone. This is how the whole Agent Config looks like:
And as these were up & running, I configured the services & started the sidecars and we’re finally having all green on the consul dashboard for our services and can connect through the mesh!
I’m not entirely sure how he got to the line for the Client configuration, I’ll research it but just generally speaking it would be great to get a bit what of a better overview how consul connect envoy works compared to the built-in proxy than what the tutorial offers.
I assume the change here was to add 127.0.0.1 to addresses.grpc. This makes sense that it would resolve the issue you were seeing. By default, the consul connect envoy attempts to connect to the gRPC/xDS server at 127.0.0.1:8502. You will receive an error if the proxy is unable to contact the server at this address.
The tutorial likely did not call out the need to configure gRPC to listen on localhost because this is the default behavior of Consul unless changed via the -client/client_addr configuration options.
We could add a callout to the tutorial stating that gRPC needs to be configured to listen on 127.0.0.1, or the alternate gRPC listening address needs to be provided when starting the proxy by either using the -grpc-addr command line flag or the CONSUL_GRPC_ADDR environment variable.
Do you think that change would be helpful in clarifying the configuration requirements?
We didn’t have that part of the configuration in our consul agent client configuration AT ALL. I’m yet to find that part of the documentation stating the need to add it but it’s definitely not within the tutorial and I didn’t spot it either anywhere in the consul connect documentation. It’s also not necessary for the built-in proxy.
Could you elaborate on this please? what do you refer as “server” here? Is it just localhost of either consul client or server or do you mean the consul server?
I’m asking because currently we have that port up & running not only on localhost:
Which does still confuse me on the part of whether the consul clients and server need to speak to each other via TCP/8502 or not.
I’d say the confusion here stems from us not using -dev mode at all and wanting to create a Mesh via 2 VMs in the same network and a Consul Server in a separated network. So to me this should be highlighted here at the bottom where the tutorial is elaborating about a PoC environment. Because my interpretation was “enable connect and configure the grpc port on your consul server” and “on your client all you have to do is to configure the grpc port”
We’ve found that part of the documentation and figured we should see the port running on 8502 by default on localhost but it just wasn’t there. Now it does make sense however.
Highlighting the mentioned configuration would definitely help. Maybe also elaborate a bit on the grpc configuration as that’s a key difference between the two options.
Eitherway thanks so much for helping here as well @blake
I’ve made some further investigations and right now it seems to me that all I need to get the mesh working is:
Firewall
Consul Client → Consul Server (tcp/8502)
Consul Client Consumer → Consul Client Provider (tcp/21000-21255) Iptables
Accept on Consul Client Provider (tcp/21000-21255)
bears the question why we need a grpc listener on the consul clients in the first place? Is that just a localhost thing for envoy?
Just to elaborate on the configuration:
Consul Client ↔ Consul Server (Firewall / Iptables)
root@consul-client-provider:~$ nc -vz consul-server 8502
Connection to consul-server 8502 port [tcp/*] succeeded!
root@consul-client-consumer:~$ nc -vz consul-server 8502
Connection to consul-server 8502 port [tcp/*] succeeded!
root@consul-server:~$ nc -vz -w 10 consul-client-provider 8502
nc: connect to consul-client-provider port 8502 (tcp) timed out: Operation now in progress
root@consul-server:~$ nc -vz -w 10 consul-client-consumer 8502
nc: connect to consul-client-consumer port 8502 (tcp) timed out: Operation now in progress
Consul Client Provider ↔ Consul Client Consumer (Firewall / Iptables)
root@consul-client-consumer:~$ nc -vz consul-client-provider 8502
nc: connect to consul-client-provider port 8502 (tcp) failed: Connection timed out
root@consul-client-consumer:~$ nc -vz consul-client-provider 21000
Connection to consul-client-provider 21000 port [tcp/*] succeeded!
root@consul-client-provider:~$ nc -vz -w 10 consul-client-consumer 8502
nc: connect to consul-client-consumer port 8502 (tcp) timed out: Operation now in progress
root@consul-client-provider:~$ nc -vz -w 10 consul-client-consumer 21000
nc: connect to consul-client-consumer port 21000 (tcp) timed out: Operation now in progress
Consul Client Provider has an nginx running on Port 80. This is the Service on it:
root@consul-client-consumer:~# curl localhost:9001
<!DOCTYPE html>
<html>
<head>
<title>Welcome to example.com on consul-client-provider</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to example.com on consul-client-provider</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
</body>
</html>
So yeah, still confused and trying to understand what ports/configuration is really needed here. Any help to demystify greatly appreciated!
Unless… this is somehow cached? Because we had 8502 and 21000 open from both Consul Clients but the iptables weren’t saved
You don’t actually need to enable the gRPC port on the servers, unless they are going to be running proxies for local applications. See below for why this is the case.
Consul’s xDS control plane uses a distributed architecture. Client agents host the xDS server for sidecar proxies within the service mesh, instead of the xDS server being centralized on a smaller number of Consul servers. This is documented in the first paragraph on https://www.consul.io/docs/connect/proxies/envoy.
Consul configures Envoy by optionally exposing a gRPC service on the local agent that serves Envoy’s xDS configuration API.
You normally do not need to configure client agents to expose the gRPC port on a non-loopback interface.
I recommend reading pages 19 - 35 from The Life of a Packet Through Consul Service Mesh. Those sections should help to better explain the architecture of Connect, which ports proxies use to communicate with agents, and the different listener types used to send/receive traffic in the mesh.