Hi,
we are trying to set up a consul service mesh for mixed environment:VM + k8s. Everything is located in our infra - we are trying to keep all in one consul datacenter - without federation set up - so we’re using external servers for k8s.
It generally works - but there is a problem with connections from services outside of k8s to services inside the k8s. As services inside k8s are publishing internal IPs which are unreachable from outside the k8s cluster.
We are trying to solve it with using mesh gateway - set up in k8s and use it in remote mode by services outside of k8s cluster. But we can’t get it to work.
Would really appreciate some ideas on how to implement it.
on VM:
service {
name = "web"
port = 32768
connect {
sidecar_service {
proxy {
upstreams = [
{
destination_name = "static-server2"
local_bind_port = 1234
mesh_gateway = {
mode = "remote"
}
}
]
}
}
}
}
envoy is running
consul connect envoy -sidecar-for web
in k8s:
consul-consul-mesh-gateway-6f459676b4-spj2w 1/1 Running
consul-consul-mesh-gateway LoadBalancer 10.152.183.237 10.64.140.43,10.0.2.8 443:30988/TCP 174m
If I understand correctly after the envoy on VM is started it should redirect connections for localhost:1234 to mesh-gateway which will redirect them further to the destination - static server in this case.
tcp LISTEN 0 4096 127.0.0.1:1234 0.0.0.0:* users:(("envoy",pid=5438,fd=25))
it seems to listen, but don’t send anything to the mesh-gateway .
What are we missing? (probably something dumb…)
Thanks!
curl 127.0.0.1:1234
curl: (56) Recv failure: Connection reset by peer
[2023-05-23 13:15:31.855][5455][debug][conn_handler] [source/server/active_tcp_listener.cc:147] [C73] **new connection from 127.0.0.1:48866**
[2023-05-23 13:15:35.133][5438][debug][main] [source/server/server.cc:251] flushing stats
[2023-05-23 13:15:36.857][5455][debug][pool] [source/common/conn_pool/conn_pool_base.cc:786] [C74] connect timeout
[2023-05-23 13:15:36.857][5455][debug][connection] [source/common/network/connection_impl.cc:139] [C74] closing data_to_write=0 type=1
[2023-05-23 13:15:36.857][5455][debug][connection] [source/common/network/connection_impl.cc:250] [C74] closing socket: 1
[2023-05-23 13:15:36.858][5455][debug][pool] [source/common/conn_pool/conn_pool_base.cc:483] [C74] client disconnected, failure reason:
[2023-05-23 13:15:36.858][5455][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:594] [C73] connect timeout
[2023-05-23 13:15:36.858][5455][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:371] [C73] **Creating connection to cluster dd412229~static-server2.default.dc1.internal.ad9c9e3b-7324-9a17-a952-58cefa707a77.consul**
Hi @antonof.k,
Welcome to HashiCorp Forums!
Did you explore your pods’ options to have a routable IP address from your VM subnet? I think that will be the easiest option. The requirement would be similar to what is documented here: Single Consul Datacenter in Multiple Kubernetes Clusters - Kubernetes | Consul | HashiCorp Developer
If you want the services in the Consul Mesh to be accessed by external clients, you should use an Ingress Gateway or Consul API Gateway. This is specifically geared towards clients coming from outside the Mesh.
Ref:
hei. thanks.
yup i’ve seen this.
the problem with that topology is this requirement -
This deployment topology requires that the Kubernetes clusters have a flat network for both pods and nodes so that pods or nodes from one cluster can connect to pods or nodes in another.
But pods in k8s and VMs are in different networks, and VMs can’t reach the pods.
That’s why we are trying to use mesh gateway - for traffic from VMs to pods.
Basically we can make the mesh gateway reachable by exposing it as a NodePort and proxy traffic inside. But at the current momment we can’t find a correct configuration to firect the raffic through it.
Also, we’ve tried to make wan federation work:
global:
name: consul
datacenter: dc2
tls:
enabled: true
caCert:
secretName: consul-federation
secretKey: caCert
caKey:
secretName: consul-federation
secretKey: caKey
# Delete this acls section if ACLs are disabled.
acls:
manageSystemACLs: true
replicationToken:
secretName: consul-federation
secretKey: replicationToken
federation:
enabled: true
k8sAuthMethodHost: https://10.0.2.8:16443
primaryDatacenter: dc1
primaryGateways: ["10.0.2.5:9100"]
# Delete this gossipEncryption section if gossip encryption is disabled.
gossipEncryption:
secretName: consul-gossip
secretKey: gossipEncryptionKey
connectInject:
enabled: true
meshGateway:
enabled: true
wanAddress:
source: Static
static: 10.0.2.8
k8s@k8s:~/consul$ k get all
NAME READY STATUS RESTARTS AGE
pod/consul-webhook-cert-manager-5dd6848777-clwl8 1/1 Running 0 118s
pod/consul-server-0 1/1 Running 0 118s
pod/consul-connect-injector-7cd4c467c5-rz848 1/1 Running 0 118s
pod/consul-mesh-gateway-74cf79b554-jz8qf 1/1 Running 0 118s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/consul-server ClusterIP None <none> 8501/TCP,8502/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP 119s
service/consul-dns ClusterIP 10.152.183.53 <none> 53/TCP,53/UDP 119s
service/consul-ui ClusterIP 10.152.183.19 <none> 443/TCP 119s
service/consul-connect-injector ClusterIP 10.152.183.248 <none> 443/TCP 119s
service/consul-mesh-gateway LoadBalancer 10.152.183.87 10.64.140.43 443:31816/TCP 119s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/consul-webhook-cert-manager 1/1 1 1 118s
deployment.apps/consul-connect-injector 1/1 1 1 118s
deployment.apps/consul-mesh-gateway 1/1 1 1 118s
NAME DESIRED CURRENT READY AGE
replicaset.apps/consul-webhook-cert-manager-5dd6848777 1 1 1 118s
replicaset.apps/consul-connect-injector-7cd4c467c5 1 1 1 118s
replicaset.apps/consul-mesh-gateway-74cf79b554 1 1 1 118s
NAME READY AGE
statefulset.apps/consul-server 1/1 118s
but the problem is, consul server on VM can’t reach server in k8s as it tries to reach it directly and not through the mesh gateway.
consul@consul-server-001:/etc/consul.d$ consul members -wan
Node Address Status Type Build Protocol DC Partition Segment
consul-server-0.dc2 10.1.77.22:8302 alive server 1.15.1 2 dc2 default <all>
consul-server-001.dc1 10.0.2.5:8302 alive server 1.14.5 2 dc1 default <all>
logs:
wan: memberlist: Suspect consul-server-0.dc2 has failed, no acks received
wan: memberlist: Failed to send gossip to 10.1.77.63:8302: read tcp 10.0.2.5:47772->10.0.2.5:9100: read>
wan: memberlist: Failed to send gossip to 10.1.77.63:8302: read tcp 10.0.2.5:47786->10.0.2.5:9100: read>
wan: memberlist: Failed to send gossip to 10.1.77.63:8302: read tcp 10.0.2.5:47802->10.0.2.5:9100: read>
wan: memberlist: Failed to send gossip to 10.1.77.63:8302: read tcp 10.0.2.5:47816->10.0.2.5:9100: read>
wan: memberlist: Failed to send UDP compound ping and suspect message to 10.1.77.63:8302: read tcp 10.0>
[ERROR] agent.proxycfg: Failed to handle update from watch: kind=mesh-gateway proxy=gateway-primary service_id=gateway-primary id=mesh-gateway:dc2 error="error filling agent cache: Remote DC has no server currently reachable"
not sure how to solve it.
it seems like mesh gateway from Vm can connect to mesh-gateway:dc2 but then there is no proxying to the k8s consul server
Just for the record.
After a lot of testing two things made the difference - changing consul version on the primary DC - it was lower then on the secondary.
And fixing the policy on the anonymous token, for some reason it wasn’t applied there.