Mesh gateway in mixed k8s + VM datacenter

Hi,
we are trying to set up a consul service mesh for mixed environment:VM + k8s. Everything is located in our infra - we are trying to keep all in one consul datacenter - without federation set up - so we’re using external servers for k8s.

It generally works - but there is a problem with connections from services outside of k8s to services inside the k8s. As services inside k8s are publishing internal IPs which are unreachable from outside the k8s cluster.
We are trying to solve it with using mesh gateway - set up in k8s and use it in remote mode by services outside of k8s cluster. But we can’t get it to work.

Would really appreciate some ideas on how to implement it.

on VM:

service {
  name = "web"
  port = 32768
  connect {
    sidecar_service {
      proxy {
        upstreams = [
          {
            destination_name = "static-server2"
            local_bind_port = 1234
                mesh_gateway = {
                    mode = "remote"
                }
          }
        ]
      }
    }
  }
}

envoy is running

consul connect envoy -sidecar-for web

in k8s:

consul-consul-mesh-gateway-6f459676b4-spj2w           1/1     Running

consul-consul-mesh-gateway       LoadBalancer   10.152.183.237   10.64.140.43,10.0.2.8   443:30988/TCP   174m




image

If I understand correctly after the envoy on VM is started it should redirect connections for localhost:1234 to mesh-gateway which will redirect them further to the destination - static server in this case.

tcp   LISTEN 0      4096         127.0.0.1:1234       0.0.0.0:*     users:(("envoy",pid=5438,fd=25))

it seems to listen, but don’t send anything to the mesh-gateway .

What are we missing? (probably something dumb…)

Thanks!

curl 127.0.0.1:1234
curl: (56) Recv failure: Connection reset by peer


[2023-05-23 13:15:31.855][5455][debug][conn_handler] [source/server/active_tcp_listener.cc:147] [C73] **new connection from 127.0.0.1:48866**
[2023-05-23 13:15:35.133][5438][debug][main] [source/server/server.cc:251] flushing stats
[2023-05-23 13:15:36.857][5455][debug][pool] [source/common/conn_pool/conn_pool_base.cc:786] [C74] connect timeout
[2023-05-23 13:15:36.857][5455][debug][connection] [source/common/network/connection_impl.cc:139] [C74] closing data_to_write=0 type=1
[2023-05-23 13:15:36.857][5455][debug][connection] [source/common/network/connection_impl.cc:250] [C74] closing socket: 1
[2023-05-23 13:15:36.858][5455][debug][pool] [source/common/conn_pool/conn_pool_base.cc:483] [C74] client disconnected, failure reason: 
[2023-05-23 13:15:36.858][5455][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:594] [C73] connect timeout
[2023-05-23 13:15:36.858][5455][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:371] [C73] **Creating connection to cluster dd412229~static-server2.default.dc1.internal.ad9c9e3b-7324-9a17-a952-58cefa707a77.consul**

Hi @antonof.k,

Welcome to HashiCorp Forums!

Did you explore your pods’ options to have a routable IP address from your VM subnet? I think that will be the easiest option. The requirement would be similar to what is documented here: Single Consul Datacenter in Multiple Kubernetes Clusters - Kubernetes | Consul | HashiCorp Developer

If you want the services in the Consul Mesh to be accessed by external clients, you should use an Ingress Gateway or Consul API Gateway. This is specifically geared towards clients coming from outside the Mesh.

Ref:

hei. thanks.
yup i’ve seen this.

the problem with that topology is this requirement -
This deployment topology requires that the Kubernetes clusters have a flat network for both pods and nodes so that pods or nodes from one cluster can connect to pods or nodes in another.

But pods in k8s and VMs are in different networks, and VMs can’t reach the pods.
That’s why we are trying to use mesh gateway - for traffic from VMs to pods.
Basically we can make the mesh gateway reachable by exposing it as a NodePort and proxy traffic inside. But at the current momment we can’t find a correct configuration to firect the raffic through it.

Also, we’ve tried to make wan federation work:

global:
  name: consul
  datacenter: dc2
  tls:
    enabled: true
    caCert:
      secretName: consul-federation
      secretKey: caCert
    caKey:
      secretName: consul-federation
      secretKey: caKey

  # Delete this acls section if ACLs are disabled.
  acls:
    manageSystemACLs: true
    replicationToken:
      secretName: consul-federation
      secretKey: replicationToken

  federation:
    enabled: true
    k8sAuthMethodHost: https://10.0.2.8:16443
    primaryDatacenter: dc1
    primaryGateways: ["10.0.2.5:9100"]
  # Delete this gossipEncryption section if gossip encryption is disabled.
  gossipEncryption:
    secretName: consul-gossip
    secretKey: gossipEncryptionKey

connectInject:
  enabled: true
meshGateway:
  enabled: true
  wanAddress:
    source: Static
    static: 10.0.2.8

k8s@k8s:~/consul$ k get all
NAME                                               READY   STATUS    RESTARTS   AGE
pod/consul-webhook-cert-manager-5dd6848777-clwl8   1/1     Running   0          118s
pod/consul-server-0                                1/1     Running   0          118s
pod/consul-connect-injector-7cd4c467c5-rz848       1/1     Running   0          118s
pod/consul-mesh-gateway-74cf79b554-jz8qf           1/1     Running   0          118s

NAME                              TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)                                                                            AGE
service/consul-server             ClusterIP      None             <none>         8501/TCP,8502/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP   119s
service/consul-dns                ClusterIP      10.152.183.53    <none>         53/TCP,53/UDP                                                                      119s
service/consul-ui                 ClusterIP      10.152.183.19    <none>         443/TCP                                                                            119s
service/consul-connect-injector   ClusterIP      10.152.183.248   <none>         443/TCP                                                                            119s
service/consul-mesh-gateway       LoadBalancer   10.152.183.87    10.64.140.43   443:31816/TCP                                                                      119s

NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/consul-webhook-cert-manager   1/1     1            1           118s
deployment.apps/consul-connect-injector       1/1     1            1           118s
deployment.apps/consul-mesh-gateway           1/1     1            1           118s

NAME                                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/consul-webhook-cert-manager-5dd6848777   1         1         1       118s
replicaset.apps/consul-connect-injector-7cd4c467c5       1         1         1       118s
replicaset.apps/consul-mesh-gateway-74cf79b554           1         1         1       118s

NAME                             READY   AGE
statefulset.apps/consul-server   1/1     118s


but the problem is, consul server on VM can’t reach server in k8s as it tries to reach it directly and not through the mesh gateway.

consul@consul-server-001:/etc/consul.d$ consul members -wan
Node                   Address          Status  Type    Build   Protocol  DC   Partition  Segment
consul-server-0.dc2    10.1.77.22:8302  alive   server  1.15.1  2         dc2  default    <all>
consul-server-001.dc1  10.0.2.5:8302    alive   server  1.14.5  2         dc1  default    <all>

logs:
wan: memberlist: Suspect consul-server-0.dc2 has failed, no acks received
wan: memberlist: Failed to send gossip to 10.1.77.63:8302: read tcp 10.0.2.5:47772->10.0.2.5:9100: read>
wan: memberlist: Failed to send gossip to 10.1.77.63:8302: read tcp 10.0.2.5:47786->10.0.2.5:9100: read>
wan: memberlist: Failed to send gossip to 10.1.77.63:8302: read tcp 10.0.2.5:47802->10.0.2.5:9100: read>
wan: memberlist: Failed to send gossip to 10.1.77.63:8302: read tcp 10.0.2.5:47816->10.0.2.5:9100: read>
wan: memberlist: Failed to send UDP compound ping and suspect message to 10.1.77.63:8302: read tcp 10.0>
[ERROR] agent.proxycfg: Failed to handle update from watch: kind=mesh-gateway proxy=gateway-primary service_id=gateway-primary id=mesh-gateway:dc2 error="error filling agent cache: Remote DC has no server currently reachable"

not sure how to solve it.
it seems like mesh gateway from Vm can connect to mesh-gateway:dc2 but then there is no proxying to the k8s consul server :face_with_raised_eyebrow:

Just for the record.

After a lot of testing two things made the difference - changing consul version on the primary DC - it was lower then on the secondary.
And fixing the policy on the anonymous token, for some reason it wasn’t applied there.