Connecting K8s and Nomad using a single Consul Server (DC1). Is this even possible or what is the next best way to do so?

harsh.lif3 · September 12, 2024, 10:10am

Dear all,

Currently I have setup K8s cluster, Nomad cluster and a consul server outside of both of them. I also have an assumption that these clusters are owned by different teams / stakeholders hence, they should be in their own admin boundaries.

I am trying to use a single consul server (DC) to connect a K8s and a Nomad cluster to achieve workload failover & load balancing. So far I have achieved the following;

Setup 1 Consul server externally
Connected the K8s and Nomad as data planes to this external consul server

However, this doesn’t seem right since everything (the nomad and k8s services) is mixed in a single server. While searching I found about Admin Partitions to enable you to define administrative and communication boundaries between services managed by separate teams or belonging to separate stakeholders. However, since this is an Enterprise feature it is not possible to use it for me.

I also came across WAN Federation and for that we have to have multiple Consul servers (DCs) to connect. In my case Consul servers has to be installed on both K8s and Nomad.

As per my understanding there is no alternative way to use 1 single Consul server (DC) to connect multiple clusters.

I am confused on selecting what actual way should I proceed to use 1 single Consul Server (DC1) to connect k8s and nomad. I don’t know if that is even possible without Admin Partitions. If not what is the next best way to get it working. Also, I think I should use both service discovery and service mesh to realize this to enable communication between the services of separate clusters.

I kindly see your expert advice to resolve my issue.

Thank you so much in advance.

Ranjandas · September 13, 2024, 9:30am

Hi @harsh.lif3,

Could you please explain what you meant by this:

However, this doesn’t seem right since everything (the nomad and k8s services) is mixed in a single server.

What you are trying to do is possible. However, if you want your Nomad workloads to talk to the K8S workloads, your Nomad nodes and Kubernetes pods should have routable IP addresses that allow them to talk to each other.

I am trying to understand where you are stuck.

harsh.lif3 · September 14, 2024, 10:24am

Hi @Ranjandas, Thank you for the message.

I meant that when directly add workloads from K8s and Nomad into the single Consul Server they all appear together without any isolation. As per some information I read it can lead to resource collisions, IP overlapping issues, etc. if Admin partitions are not used.

I am actually stuck in choosing the correct and most feasible way to achieve the following;

My intention is to have a central Consul Server to enable fail over and load balancing between k8s and nomad cluster workloads. For example, I plan to deploy the same services and pods in both of them and test the fail over by scaling down the services in one cluster and vice versa. Similarly the load balancing as well. I was thinking to have a central Consul Server to avoid creating and managing multiple Consul servers in K8s and nomad. Also, then I can add more K8s and Nomad clusters into the same Consul Server (DC1).

So far what I have understood is, every pod which needs to be part of the service mesh should be annotated accordingly to enable the sidecar. To enable service discovery, the k8s services needs to be annotated accordingly to sync with Consul. However, as I understand Consul component in Nomad is similar to services as in K8s. So in this case I think I need to use both service discovery and service mesh features if I am not mistaken.

What I do not get is, the possibility to use mesh gateways and cluster peering, since as I understood they require multiple Consul Servers (DC1, DC2, etc). In my case would it be possible since I am using Single (Central ) Consul server to connect multiple cluster workloads.

On the other hand, I read that Admin partitions can be used to make cluster boundaries within a single Consul server. However, this is not feasible right now since it is part of their Enterprise features.

Finally, what I would like to achieve is having workloads running on K8s and Nomad to be able to talk to each other. When I query a service inside k8s it should be routed to the back-end pods which are in either K8s or Nomad. Same should happen when I query a service inside Nomad. Also, when the back-end pods are not reachable in either cluster it should be routed to the other respective cluster where the pods are running.

I am overwhelmed by the documentation as I am still new to all this. I would sincerely appreciate your kind expert advice on how and what steps / technologies I should follow to achieve this using a single Consul server while achieving some logical separation as well without using Admin partitions. Or if that is not possible using a Single Consul Server (DC1) what would be the next best way to do so.

I apologize for making this message so long. I tried to explain everything that I am going through now.

Thank you!

Ranjandas · September 17, 2024, 1:09pm

Hi @harsh.lif3

I meant that when directly add workloads from K8s and Nomad into the single Consul Server they all appear together without any isolation. As per some information I read it can lead to resource collisions, IP overlapping issues, etc. if Admin partitions are not used.

First of all, you shouldn’t be running a Single Consul server. It should be clusters with at least 3 or 5 servers based on the level of availability that you are aiming for.

While the workloads from K8S and Nomad are added into Consul Servers, they don’t appear together. Think of the Consul server as just a catalog, the workloads are registered with against their respective nodes on which the services are running. In case of K8S, these are virtual nodes in Consul, and for Nomad, these are Consul client agents that run alongside the Nomad Worker Nodese.

My intention is to have a central Consul Server to enable fail over and load balancing between k8s and nomad cluster workloads.

As mentioned previously, this is achievable as long as your Nomad worker nodes and K8S pods have routable IP addresses so that they can talk to each other.

So far what I have understood is, every pod which needs to be part of the service mesh should be annotated accordingly to enable the sidecar. To enable service discovery, the k8s services needs to be annotated accordingly to sync with Consul.

Your understanding is correct.

However, as I understand Consul component in Nomad is similar to services as in K8s. So in this case I think I need to use both service discovery and service mesh features if I am not mistaken.

While I fully understand this, please note that in both Nomad and K8S, Consul Service Mesh feature can be used.

What I do not get is, the possibility to use mesh gateways and cluster peering, since as I understood they require multiple Consul Servers (DC1, DC2, etc). In my case would it be possible since I am using Single (Central ) Consul server to connect multiple cluster workloads.

You are right, Peering is only done between DC’s or Admin Partitions and is not applicable in your use case.

Finally, what I would like to achieve is having workloads running on K8s and Nomad to be able to talk to each other. When I query a service inside k8s it should be routed to the back-end pods which are in either K8s or Nomad. Same should happen when I query a service inside Nomad. Also, when the back-end pods are not reachable in either cluster it should be routed to the other respective cluster where the pods are running.

This can be done using Consul Service Mesh.

In summary, here is what you have to do:

Run a Highly-Available Consul Server Cluster. (3 or 5 nodes)
Run a K8S Cluster with Consul installed using External Server option. Join Kubernetes Clusters to external Consul Servers | Consul | HashiCorp Developer
Run a Highly Available Nomad Server Cluster (3 or 5 nodes)
Run Nomad Worker Nodes and have Consul Clients run alongside them. (this is important)

From a Networking point of view, your Nomad worker nodes should be able to talk directly to K8S Pod IPs and vice-versa. If this requirement can’t be met, you must run 2 DC’s, one for Nomad Workloads and the other for K8S workloads and cluster peer between them.

Based on your use case, you will have to do decide which option to pick from the above.

I would recommend you build a mini setup with the above architecture and reach out if you run into issues. I hope this helps.

harsh.lif3 · September 26, 2024, 9:21pm

Dear @Ranjandas,

Thank you for the reply.

I started again from scratch and below is the current status. I’d be grateful for any advices.

While I completely understand in production it is required to have 3 or 5 HA Consul clusters, since I am working on a testing environment I have a single Consul Server.

/etc/consul.d/consul.hcl config in Consul Server

data_dir = "/opt/consul"
client_addr = "0.0.0.0"

ui_config{
  enabled = true
}

server = true
advertise_addr = "192.168.60.10"
bootstrap_expect=1
retry_join = ["192.168.60.10"]

ports {
 grpc = 8502 
}

connect {
 enabled = true
}

I then connected the External Consul Server via External Server option

global:
  enabled: false
  tls:
    enabled: false
externalServers:
  enabled: true
  hosts: ["192.168.60.10"]
  httpsPort: 8500
  k8sAuthMethodHost: "https://192.168.50.10:6443"
server:
  enabled: false
syncCatalog:
  enabled: true
  toConsul: false
  toK8S: false

I have below pods and services in K8S

k get svc,pods -o wide
NAME                            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE     SELECTOR
service/kubernetes              ClusterIP   10.96.0.1       <none>        443/TCP   41d     <none>
service/multitool-pod-service   ClusterIP   10.104.39.1     <none>        80/TCP    9h      app=multitool
service/nginx-service           ClusterIP   10.98.201.117   <none>        80/TCP    4h25m   app=k8s-nginx

NAME                             READY   STATUS    RESTARTS   AGE     IP           NODE                   NOMINATED NODE   READINESS GATES
pod/k8s-nginx-68d85bb657-fgltx   2/2     Running   0          4h28m   30.0.1.51    k8s-cluster3-worker1   <none>           <none>
pod/multitool-pod                2/2     Running   0          6h17m   30.0.1.69    k8s-cluster3-worker1   <none>           <none>

Here, pod/k8s-nginx-68d85bb657-fgltx and pod/multitool-pod has consul sidecar container injected (2/2) via 'consul.hashicorp.com/connect-inject': 'true' annotation.

nginx-service has the 'consul.hashicorp.com/service-sync': 'true' annotation. multitool-pod-service doesn’t have any annotation.

Now, so far I am able to connect to nginx pod from multitool pod using the service name as below;

kubectl exec -it pod/multitool-pod -c network-multitool -- curl nginx-service
Hello World! Response from Kubernetes!

However, It is not possible to access the nginx pod directly with IP as below. Although I assume this is due to the behavior of consul sidecar, I am not sure why it happens like this.

kubectl exec -it pod/multitool-pod -c network-multitool -- curl 30.0.1.51
curl: (52) Empty reply from server
command terminated with exit code 52

network-multitool is the container name inside the multitool-pod. If I do not use that container name it defaults to consul-dataplane container and gives below error

kubectl exec -it pod/multitool-pod -- curl nginx-service
Defaulted container "consul-dataplane" out of: consul-dataplane, network-multitool, consul-connect-inject-init (init)
OCI runtime exec failed: exec failed: unable to start container process: exec: "curl": executable file not found in $PATH: unknown
command terminated with exit code 126

So far I am only working with Consul and K8S and was able to access the nginx pod through service name.

I also have a single nomad server. 1 Nomad worker. Can I use the same service name in both K8S and Nomad in Consul? (Eg: can we use nginx-service as the service name in both K8S and nomad service.)

I saw Consul can be used as a Nomad Job as below. Is it okay or is there any other way to run Consul.

job "consul" {
    datacenters = ["dc1"]

    group "consul" {
        count = 1

        task "consul" {
            driver = "exec"

            config {
                command = "consul"
                args = [
                    "agent", 
                    "-dev",
                    "-log-level=INFO"
                ]
            }

            artifact {
                source = "https://releases.hashicorp.com/consul/1.19.0/consul_1.19.0_linux_amd64.zip"
            }

        }
    }
}

Thank you so much for your advices. I am so grateful to you. Thank you!

Ranjandas · September 26, 2024, 9:59pm

Hi @harsh.lif3,

Glad to see the progress you have made. Here are the answers to your questions.

However, It is not possible to access the nginx pod directly with IP as below. Although I assume this is due to the behavior of consul sidecar, I am not sure why it happens like this.

Yes, this is working as expected and is the default behaviour when Transparent Proxy is used. It ensures that all incoming and outgoing requests go through the sidecar thereby protecting the service from unauthorized access. Ref: Transparent proxy overview | Consul | HashiCorp Developer

If you want to change this behaviour (I wouldn’t recommend), you will have to disable transparent proxy and can be done either by

setting connectInject.transparentProxy.defaultEnabled to false: Helm Chart Reference | Consul | HashiCorp Developer
or using the annotation ('consul-hashicorp-com-transparent-proxy': 'false') per-pod: Annotations and Labels | Consul | HashiCorp Developer

network-multitool is the container name inside the multitool-pod . If I do not use that container name it defaults to consul-dataplane container and gives below error

This is also expected. ref: Net 1784 inject sidecar first by trevorLeonHC · Pull Request #2743 · hashicorp/consul-k8s · GitHub

I also have a single nomad server. 1 Nomad worker. Can I use the same service name in both K8S and Nomad in Consul? (Eg: can we use nginx-service as the service name in both K8S and nomad service.)

Yes, you should use the same service name in both K8s and Nomad to treat them as part of the same service. They will get registered as “Service Instances” under the same service.

To access the services (with transparent proxy enabled), it is better to use the virtual tagged address, so in your case, it should be curl nginx-service.virtual.consul ref: Perform static DNS queries | Consul | HashiCorp Developer

This will allow the requests to switch between K8S and Nomad instances of nginx.

I saw Consul can be used as a Nomad Job as below. Is it okay or is there any other way to run Consul.

If you are running 1 Nomad instance (working as both server and node), and if that VM has Consul in it, that should be fine. The job you linked won’t work as it runs consul as a dev agent which will cause issues.

I would say start deploying nginx on Nomad with your current setup and see what issues you run into and work your way up.

I hope this helps.

harsh.lif3 · September 28, 2024, 11:27pm

Dear @Ranjandas,

I am sincerely grateful for your responses. It has helped me a lot to understand the things better than before.

Now, I have progressed with the Nomad part. Below are steps I followed so far;
Installed Consul Client ONLY in Nomad worker node (I didn’t install consul client in Nomad Server because workloads runs on nomad worker node)

data_dir = "/opt/consul"
client_addr = "0.0.0.0"
bind_addr = "192.168.40.11"
server = false
advertise_addr = "192.168.40.11"
retry_join = ["192.168.60.10"]
log_level = "INFO

I updated the client_addr (0.0.0.0 to allow any client), bind_addr and advertise_addr to 192.168.40.11 (the IP of the consul client node, same as Nomad client node)

Similarly, in Consul server config I have the below config

data_dir = "/opt/consul"
bind_addr = "192.168.60.10"
client_addr = "0.0.0.0" #Allow connections from any client

ui_config{
  enabled = true
}

server = true
advertise_addr = "192.168.60.10"
bootstrap_expect=1
retry_join = ["192.168.60.10"]

ports {
 grpc = 8502 
}

connect {
 enabled = true
}

#log_level = "DEBUG"

config_entries {
 bootstrap = [
  {
    Kind = "proxy-defaults"
    Name = "global"
    AccessLogs {
      Enabled = true
    }
    Config {
      protocol = "http"
    }
  }
 ]
}

Added Nginx JOB configuration in Nomad with below service block

service {
        name = "nginx-service" ==> used the same service name as in K8S
        port = "http"  # Reference the network port defined above
        tags = ["nginx", "nomad"]

        connect {
          sidecar_service {
            proxy {
              transparent_proxy {}
            }
          }
        }

        check {
          type     = "http"
          path     = "/"
          interval = "10s"
          timeout  = "2s"
        }
      }

In the above service block I added transparent proxy block to avoid any DNS issues because in the documentation it says;

When transparent proxy is enabled traffic will automatically flow through the Envoy proxy. If the local Consul agent is serving DNS, Nomad will also set up the task’s nameservers to use Consul. This lets your workload use the virtual IP DNS name from Consul, rather than configuring a template block that queries services.

I can see both K8s and Nomad endpoints are available in Consul UI under same service name

Now, I am able to curl to the service from K8S as below

k exec -it multitool-pod -c multitool-container -- curl nginx-service.virtual.consul
Hello, I am running on Kubernetes!

However, it works intermittently and doesn’t work at times due to Could not resolve host:

k exec -it multitool-pod -c multitool-container -- curl nginx-service.virtual.consul
Hello, I am running on Kubernetes!
ubuntu@ubuntu-desktop:~$ k exec -it multitool-pod -c multitool-container -- curl nginx-service.virtual.consul
curl: (6) Could not resolve host: nginx-service.virtual.consul
command terminated with exit code 6
ubuntu@ubuntu-desktop:~$ k exec -it multitool-pod -c multitool-container -- curl nginx-service.virtual.consul
curl: (6) Could not resolve host: nginx-service.virtual.consul
command terminated with exit code 6
ubuntu@ubuntu-desktop:~$ k exec -it multitool-pod -c multitool-container -- curl nginx-service.virtual.consul
Hello, I am running on Kubernetes!

I think it stops working when it is trying to connect to Nomad side. But, I am not entirely sure. Below are some logs I got from multitool-pod’s consul-dataplane container

when nginx-service.virtual.consul succeeds

2024-09-28T22:54:33.069Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=62
2024-09-28T22:54:33.069Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=60
2024-09-28T22:54:33.069Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=60
2024-09-28T22:54:33.069Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=62
2024-09-28T22:54:33.071Z+00:00 [debug] envoy.filter(24) original_dst: set destination to 240.0.0.3:80
2024-09-28T22:54:33.071Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"91"] new connection from 30.0.1.17:50132
2024-09-28T22:54:33.071Z+00:00 [debug] envoy.http(24) [Tags: "ConnectionId":"91"] new stream
2024-09-28T22:54:33.071Z+00:00 [debug] envoy.http(24) [Tags: "ConnectionId":"91","StreamId":"16189153334393723254"] request headers complete (end_stream=true):
':authority', 'nginx-service.virtual.consul'
':path', '/'
':method', 'GET'
'user-agent', 'curl/7.79.1'
'accept', '*/*'

2024-09-28T22:54:33.071Z+00:00 [debug] envoy.http(24) [Tags: "ConnectionId":"91","StreamId":"16189153334393723254"] request end stream
2024-09-28T22:54:33.071Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"91"] current connecting state: false
2024-09-28T22:54:33.071Z+00:00 [debug] envoy.router(24) [Tags: "ConnectionId":"91","StreamId":"16189153334393723254"] cluster 'nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul' match for URL '/'
2024-09-28T22:54:33.072Z+00:00 [debug] envoy.router(24) [Tags: "ConnectionId":"91","StreamId":"16189153334393723254"] router decoding headers:
':authority', 'nginx-service.virtual.consul'
':path', '/'
':method', 'GET'
':scheme', 'http'
'user-agent', 'curl/7.79.1'
'accept', '*/*'
'x-forwarded-proto', 'http'
'x-request-id', '81480dd3-358f-4252-9455-0617620c1666'
'x-envoy-expected-rq-timeout-ms', '15000'

2024-09-28T22:54:33.072Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"14"] using existing fully connected connection
2024-09-28T22:54:33.072Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"14"] creating stream
2024-09-28T22:54:33.072Z+00:00 [debug] envoy.router(24) [Tags: "ConnectionId":"91","StreamId":"16189153334393723254"] pool ready
2024-09-28T22:54:33.072Z+00:00 [debug] envoy.client(24) [Tags: "ConnectionId":"14"] encode complete
2024-09-28T22:54:33.076Z+00:00 [debug] envoy.router(24) [Tags: "ConnectionId":"91","StreamId":"16189153334393723254"] upstream headers complete: end_stream=false
2024-09-28T22:54:33.076Z+00:00 [debug] envoy.http(24) [Tags: "ConnectionId":"91","StreamId":"16189153334393723254"] encoding headers via codec (end_stream=false):
':status', '200'
'server', 'envoy'
'date', 'Sat, 28 Sep 2024 22:54:33 GMT'
'content-type', 'text/html'
'content-length', '35'
'last-modified', 'Sat, 28 Sep 2024 17:20:50 GMT'
'etag', '"66f83af2-23"'
'accept-ranges', 'bytes'
'x-envoy-upstream-service-time', '3'

2024-09-28T22:54:33.076Z+00:00 [debug] envoy.client(24) [Tags: "ConnectionId":"14"] response complete
2024-09-28T22:54:33.076Z+00:00 [debug] envoy.http(24) [Tags: "ConnectionId":"91","StreamId":"16189153334393723254"] Codec completed encoding stream.
2024-09-28T22:54:33.076Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"14"] response complete
2024-09-28T22:54:33.076Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"14"] destroying stream: 0 remaining
2024-09-28T22:54:33.080Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"91"] remote close
2024-09-28T22:54:33.080Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"91"] closing socket: 0
2024-09-28T22:54:33.080Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"91"] adding to cleanup list
2024-09-28T22:54:35.641Z+00:00 [debug] envoy.main(14) flushing stats

when nginx-service.virtual.consul fails : only these entries are generated.

2024-09-28T22:57:46.104Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=72
2024-09-28T22:57:46.104Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=72
2024-09-28T22:57:46.104Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=64
2024-09-28T22:57:46.104Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=64
2024-09-28T22:57:46.104Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=60
2024-09-28T22:57:46.104Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=60
2024-09-28T22:57:46.105Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=62
2024-09-28T22:57:46.105Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=62

Logs for nginx-service.service.consul

2024-09-28T23:00:16.867Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=60
2024-09-28T23:00:16.867Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=60
2024-09-28T23:00:16.870Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=78
2024-09-28T23:00:16.870Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=96
2024-09-28T23:00:16.871Z+00:00 [debug] envoy.filter(23) original_dst: set destination to 30.0.1.225:80
2024-09-28T23:00:16.871Z+00:00 [debug] envoy.filter(23) [Tags: "ConnectionId":"127"] new tcp proxy session
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.filter(23) [Tags: "ConnectionId":"127"] Creating connection to cluster original-destination
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.upstream(23) transport socket match, socket default selected for host with address 30.0.1.225:80
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.upstream(23) Created host original-destination30.0.1.225:80 30.0.1.225:80.
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.misc(23) Allocating TCP conn pool
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.pool(23) trying to create new connection
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.upstream(14) addHost() adding original-destination30.0.1.225:80 30.0.1.225:80.
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.pool(23) creating a new connection (connecting=0)
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"128"] connecting to 30.0.1.225:80
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.upstream(14) membership update for TLS cluster original-destination added 1 removed 0
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.upstream(14) re-creating local LB for TLS cluster original-destination
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"128"] connection in progress
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"127"] new connection from 30.0.1.17:47258
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.upstream(23) membership update for TLS cluster original-destination added 1 removed 0
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.upstream(23) re-creating local LB for TLS cluster original-destination
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"128"] connected
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.pool(23) [Tags: "ConnectionId":"128"] attaching to next stream
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.pool(23) [Tags: "ConnectionId":"128"] creating stream
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.router(23) Attached upstream connection [C128] to downstream connection [C127]
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.filter(23) [Tags: "ConnectionId":"127"] TCP:onUpstreamEvent(), requestedServerName: 
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.upstream(24) membership update for TLS cluster original-destination added 1 removed 0
2024-09-28T23:00:16.872Z+00:00 [debug] envoy.upstream(24) re-creating local LB for TLS cluster original-destination
2024-09-28T23:00:16.875Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"127"] remote close
2024-09-28T23:00:16.875Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"127"] closing socket: 0
2024-09-28T23:00:16.875Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"128"] closing data_to_write=0 type=0
2024-09-28T23:00:16.875Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"128"] closing socket: 1
2024-09-28T23:00:16.875Z+00:00 [debug] envoy.pool(23) [Tags: "ConnectionId":"128"] client disconnected, failure reason: 
2024-09-28T23:00:16.875Z+00:00 [debug] envoy.pool(23) invoking 1 idle callback(s) - is_draining_for_deletion_=false
2024-09-28T23:00:16.875Z+00:00 [debug] envoy.pool(23) [Tags: "ConnectionId":"128"] destroying stream: 0 remaining
2024-09-28T23:00:16.875Z+00:00 [debug] envoy.pool(23) invoking 0 idle callback(s) - is_draining_for_deletion_=false
2024-09-28T23:00:16.875Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"127"] adding to cleanup list
2024-09-28T23:00:17.731Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"129"] new connection from 30.0.1.82:42944
2024-09-28T23:00:17.731Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"129"] closing socket: 0
2024-09-28T23:00:17.731Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"129"] adding to cleanup list

2024-09-28T23:00:20.720Z+00:00 [debug] envoy.main(14) flushing stats
2024-09-28T23:00:25.722Z+00:00 [debug] envoy.main(14) flushing stats
2024-09-28T23:00:25.895Z+00:00 [debug] envoy.upstream(24) membership update for TLS cluster original-destination added 0 removed 1
2024-09-28T23:00:25.895Z+00:00 [debug] envoy.upstream(24) re-creating local LB for TLS cluster original-destination
2024-09-28T23:00:25.895Z+00:00 [debug] envoy.upstream(14) membership update for TLS cluster original-destination added 0 removed 1
2024-09-28T23:00:25.895Z+00:00 [debug] envoy.upstream(14) re-creating local LB for TLS cluster original-destination
2024-09-28T23:00:25.895Z+00:00 [debug] envoy.upstream(23) membership update for TLS cluster original-destination added 0 removed 1
2024-09-28T23:00:25.896Z+00:00 [debug] envoy.upstream(23) re-creating local LB for TLS cluster original-destination
2024-09-28T23:00:25.896Z+00:00 [debug] envoy.upstream(23) removing hosts for TLS cluster original-destination removed 1
2024-09-28T23:00:25.896Z+00:00 [debug] envoy.upstream(24) removing hosts for TLS cluster original-destination removed 1
2024-09-28T23:00:25.896Z+00:00 [debug] envoy.upstream(14) removing hosts for TLS cluster original-destination removed 1

Logs For nginx-service.connect.consul


2024-09-28T23:03:21.819Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=60
2024-09-28T23:03:21.819Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=60
2024-09-28T23:03:21.820Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=62
2024-09-28T23:03:21.820Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=96
2024-09-28T23:03:21.820Z+00:00 [debug] envoy.filter(24) original_dst: set destination to 30.0.1.225:80
2024-09-28T23:03:21.820Z+00:00 [debug] envoy.filter(24) [Tags: "ConnectionId":"148"] new tcp proxy session
2024-09-28T23:03:21.820Z+00:00 [debug] envoy.filter(24) [Tags: "ConnectionId":"148"] Creating connection to cluster original-destination
2024-09-28T23:03:21.820Z+00:00 [debug] envoy.upstream(24) transport socket match, socket default selected for host with address 30.0.1.225:80
2024-09-28T23:03:21.820Z+00:00 [debug] envoy.upstream(24) Created host original-destination30.0.1.225:80 30.0.1.225:80.
2024-09-28T23:03:21.820Z+00:00 [debug] envoy.misc(24) Allocating TCP conn pool
2024-09-28T23:03:21.820Z+00:00 [debug] envoy.pool(24) trying to create new connection
2024-09-28T23:03:21.820Z+00:00 [debug] envoy.pool(24) creating a new connection (connecting=0)
2024-09-28T23:03:21.820Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"149"] connecting to 30.0.1.225:80
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.upstream(14) addHost() adding original-destination30.0.1.225:80 30.0.1.225:80.
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"149"] connection in progress
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"148"] new connection from 30.0.1.17:58714
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"149"] connected
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"149"] attaching to next stream
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"149"] creating stream
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.upstream(14) membership update for TLS cluster original-destination added 1 removed 0
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.upstream(14) re-creating local LB for TLS cluster original-destination
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.router(24) Attached upstream connection [C149] to downstream connection [C148]
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.upstream(23) membership update for TLS cluster original-destination added 1 removed 0
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.upstream(23) re-creating local LB for TLS cluster original-destination
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.filter(24) [Tags: "ConnectionId":"148"] TCP:onUpstreamEvent(), requestedServerName: 
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.upstream(24) membership update for TLS cluster original-destination added 1 removed 0
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.upstream(24) re-creating local LB for TLS cluster original-destination
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"148"] remote close
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"148"] closing socket: 0
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"149"] closing data_to_write=0 type=0
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"149"] closing socket: 1
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"149"] client disconnected, failure reason: 
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.pool(24) invoking 1 idle callback(s) - is_draining_for_deletion_=false
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"149"] destroying stream: 0 remaining
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.pool(24) invoking 0 idle callback(s) - is_draining_for_deletion_=false
2024-09-28T23:03:21.821Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"148"] adding to cleanup list
2024-09-28T23:03:25.772Z+00:00 [debug] envoy.main(14) flushing stats

However, I am unable to curl using nginx-service.service.consul(1) or nginx-service.connect.consul(2) at all.

(1) gives curl: (56) Recv failure: Connection reset by peer. command terminated with exit code 56
(2) gives curl: (52) Empty reply from server. command terminated with exit code 52 OR curl: (6) Could not resolve host: nginx-service.connect.consul

Additionally, I see the logs continuously outputs the following message;

2024-09-28T22:58:57.731Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"119"] new connection from 30.0.1.82:56396
2024-09-28T22:58:57.731Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"119"] closing socket: 0
2024-09-28T22:58:57.731Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"119"] adding to cleanup list
2024-09-28T22:59:00.698Z+00:00 [debug] envoy.main(14) flushing stats
2024-09-28T22:59:05.701Z+00:00 [debug] envoy.main(14) flushing stats
2024-09-28T22:59:06.110Z [DEBUG] consul-dataplane.dns-proxy.udp: timeout waiting for read: error="read udp 127.0.0.1:8600: i/o timeout"
2024-09-28T22:59:07.731Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"120"] new connection from 30.0.1.82:34324
2024-09-28T22:59:07.731Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"120"] closing socket: 0
2024-09-28T22:59:07.731Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"120"] adding to cleanup list
2024-09-28T22:59:10.704Z+00:00 [debug] envoy.main(14) flushing stats
2024-09-28T22:59:15.708Z+00:00 [debug] envoy.main(14) flushing stats
2024-09-28T22:59:16.111Z [DEBUG] consul-dataplane.dns-proxy.udp: timeout waiting for read: error="read udp 127.0.0.1:8600: i/o timeout"

I do not have any pod/service with IP 30.0.1.82.

Also, I could see the following results when using the DIG command inside both Nomad and K8S nodes, k8s pods and nomad task;

dig @192.168.60.10 -p 8600 nginx-service.virtual.consul

; <<>> DiG 9.18.28-0ubuntu0.24.04.1-Ubuntu <<>> @192.168.60.10 -p 8600 nginx-service.virtual.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33461
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nginx-service.virtual.consul.  IN      A

;; ANSWER SECTION:
nginx-service.virtual.consul. 0 IN      A       240.0.0.3

;; Query time: 0 msec
;; SERVER: 192.168.60.10#8600(192.168.60.10) (UDP)


dig @192.168.60.10 -p 8600 nginx-service.service.consul

; <<>> DiG 9.18.28-0ubuntu0.24.04.1-Ubuntu <<>> @192.168.60.10 -p 8600 nginx-service.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27777
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nginx-service.service.consul.  IN      A

;; ANSWER SECTION:
nginx-service.service.consul. 0 IN      A       30.0.1.225
nginx-service.service.consul. 0 IN      A       192.168.40.11

;; Query time: 1 msec
;; SERVER: 192.168.60.10#8600(192.168.60.10) (UDP)

dig @192.168.60.10 -p 8600 nginx-service.connect.consul

; <<>> DiG 9.18.28-0ubuntu0.22.04.1-Ubuntu <<>> @192.168.60.10 -p 8600 nginx-service.connect.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56585
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nginx-service.connect.consul.  IN      A

;; ANSWER SECTION:
nginx-service.connect.consul. 0 IN      A       30.0.1.225

;; Query time: 0 msec
;; SERVER: 192.168.60.10#8600(192.168.60.10) (UDP)

nginx-service.virtual.consul shows some random virtual IP, nginx-service.service.consul shows both K8S Pod and Nomad worker node’s IP, nginx-service.connect.consul shows the K8S Pod IP.

However, if I remove the DNS port 8600 from the DIG command it falls back to port 53 and stops working.

So I believe this could be a DNS issue. Do we have to use additional configurations to change DNS when Consul is used. I thought when transparent proxy is used DNS is handled automatically as I understood.

I also came across DNS usage overview | Consul | HashiCorp Developer which says

"If you are using Consul for service mesh on VMs, you can use upstreams or DNS. We recommend using upstreams because you can query services and nodes without modifying the application code or environment variables. "

From Nomad side I can only access the service by directly using node IP and port. I am unable to access using any consul service name at all.

I am sorry to make this too long, but I wanted to insert all the details that I found and completed so far to get your kind advice.

Thank you!

Ranjandas · September 29, 2024, 1:14am

Hi @harsh.lif3,

When running with transparent proxy enabled, every connect injected pod gets custom DNSConfig that uses 127.0.0.1 (a DNS proxy inside the dataplane process), and the secondary will be the Kubernetes DNS.

Example:

$ k get pods static-client-c46596589-bztps -o jsonpath="{.spec.dnsConfig}" | jq
{
  "nameservers": [
    "127.0.0.1",
    "10.43.0.10"
  ],
  "options": [
    {
      "name": "ndots",
      "value": "5"
    }
  ],
  "searches": [
    "default.svc.cluster.local",
    "svc.cluster.local",
    "cluster.local"
  ]
}

I think your intermittent failure is due to the DNS client switching between the two nameservers in the list, and from the data plane logs you shared, it looks like your data plane DNS proxy is not working properly.

Can you share your values.yaml file used to deploy the Cluster? Try to figure out why the DNS proxy is not working, which should fix your issue.

harsh.lif3 · September 29, 2024, 8:24am

Dear @Ranjandas,

Thank you for your response. Below is my values.yaml file which I used to setup Consul in Kubernetes. (helm upgrade --install consul hashicorp/consul -n consul -f consul-values.yaml)

global:
  enabled: false
  logLevel: "debug"
  tls:
    enabled: false
externalServers:
  enabled: true
  hosts: ["192.168.60.10"]
  httpsPort: 8500
  k8sAuthMethodHost: "https://192.168.50.10:6443"
server:
  enabled: false
syncCatalog:
  enabled: true
  #toConsul: false
  #toK8S: false
  default: false

I also got the following outputs while testing

 k exec -it multitool-pod -c multitool-container -- nslookup kubernetes.default
;; Got recursion not available from 127.0.0.1, trying next server
;; Got recursion not available from 127.0.0.1, trying next server
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1
;; Got recursion not available from 127.0.0.1, trying next server

ubuntu@ubuntu-desktop:~$ k exec -it multitool-pod -c multitool-container -- cat /etc/resolv.conf
nameserver 127.0.0.1
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Moreover, as per Debugging DNS Resolution | Kubernetes

If you are using Alpine version 3.17 or earlier as your base image, DNS may not work properly due to a design issue with Alpine. Until musl version 1.24 didn’t include TCP fallback to the DNS stub resolver meaning any DNS call above 512 bytes would fail. Please upgrade your images to Alpine version 3.18 or above.

Currently, when I check the OS version it showed me alpine 3.13, so I change the docker image. Now I am using wbitt/network-multitool:latest image which has the Alpine 3.18 version.

k exec -it test-pod -c test-pod-container -- cat /etc/os-release 
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.18.3
PRETTY_NAME="Alpine Linux v3.18"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"

UPDATE:
I updated values.yaml file as below in K8S (added DNS section)

global:
  enabled: false
  logLevel: "debug"
  tls:
    enabled: false
externalServers:
  enabled: true
  hosts: ["192.168.60.10"]
  httpsPort: 8500
  k8sAuthMethodHost: "https://192.168.50.10:6443"
server:
  enabled: false
syncCatalog:
  enabled: true
  #toConsul: false
  #toK8S: false
  default: false
dns:
  enabled: true
  enableRedirection: true

Below are my pods and services in consul namespace

k get all -n consul
NAME                                                      READY   STATUS    RESTARTS   AGE
pod/consul-consul-connect-injector-7fb8dc89b-cmmzj        1/1     Running   0          13h
pod/consul-consul-sync-catalog-7cd6c65d76-mcfrz           1/1     Running   0          13h
pod/consul-consul-webhook-cert-manager-5d48957cc8-xjjmt   1/1     Running   0          13h


NAME                                     TYPE           CLUSTER-IP      EXTERNAL-IP                                     PORT(S)         AGE
service/consul                           ExternalName   <none>          consul.service.consul                           <none>          19h
service/consul-consul-connect-injector   ClusterIP      10.99.191.129   <none>                                          443/TCP         14d
service/consul-consul-dns                ClusterIP      10.97.111.170   <none>                                          53/TCP,53/UDP   14d
service/nginx-service                    ExternalName   <none>          nginx-service.service.consul                    <none>          19h
service/nginx-service-sidecar-proxy      ExternalName   <none>          nginx-service-sidecar-proxy.service.consul      <none>          19h
service/nomad-client                     ExternalName   <none>          nomad-client.service.consul                     <none>          16h
service/test-pod-service                 ExternalName   <none>          test-pod-service.service.consul                 <none>          150m
service/test-pod-service-sidecar-proxy   ExternalName   <none>          test-pod-service-sidecar-proxy.service.consul   <none>          150m

NAME                                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/consul-consul-connect-injector       1/1     1            1           14d
deployment.apps/consul-consul-sync-catalog           1/1     1            1           14d
deployment.apps/consul-consul-webhook-cert-manager   1/1     1            1           14d

Then, I added the below code block into coredns configmap 10.97.111.170 is the IP of consul dns service (service/consul-consul-dns ClusterIP 10.97.111.170 <none> 53/TCP,53/UDP)

consul:53 {
errors
cache 30
forward . 10.97.111.170
}

Full coredns configmap

k edit configmap coredns -n kube-system

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
    consul:53 {
        errors
        cache 30
        forward . 10.97.111.170
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2024-08-16T17:21:10Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "8386443"
  uid: f07df6bc-63ff-46e4-8314-05f2cf578770

Then I restarted the coredns deployment. (k rollout restart deployment coredns -n kube-system)

I deleted and redeployed the test-pod.

I notice a slight change (although I am not exactly sure if this was the same in previous logs) when I made these changes. When I run

k exec -it test-pod -c test-pod-container -- curl nginx-service.service.consul

I get 02 error messages;
For error curl: (56) Recv failure: Connection reset by peer I see the below log entries;
Here it shows the IP 192.168.40.11:80 as the destination => this is the IP of Nomad worker node which runs the Nginx task


2024-09-29T11:56:41.367Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=60
2024-09-29T11:56:41.368Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=78
2024-09-29T11:56:41.368Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=96
2024-09-29T11:56:41.370Z+00:00 [debug] envoy.filter(23) original_dst: set destination to 192.168.40.11:80
2024-09-29T11:56:41.370Z+00:00 [debug] envoy.filter(23) [Tags: "ConnectionId":"45"] new tcp proxy session
2024-09-29T11:56:41.370Z+00:00 [debug] envoy.filter(23) [Tags: "ConnectionId":"45"] Creating connection to cluster original-destination
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.upstream(23) transport socket match, socket default selected for host with address 192.168.40.11:80
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.upstream(23) Created host original-destination192.168.40.11:80 192.168.40.11:80.
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.misc(23) Allocating TCP conn pool
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.pool(23) trying to create new connection
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.upstream(13) addHost() adding original-destination192.168.40.11:80 192.168.40.11:80.
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.upstream(13) membership update for TLS cluster original-destination added 1 removed 0
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.upstream(13) re-creating local LB for TLS cluster original-destination
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.pool(23) creating a new connection (connecting=0)
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"46"] connecting to 192.168.40.11:80
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"46"] connection in progress
2024-09-29T11:56:41.371Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"45"] new connection from 30.0.1.220:34354
2024-09-29T11:56:41.372Z+00:00 [debug] envoy.upstream(23) membership update for TLS cluster original-destination added 1 removed 0
2024-09-29T11:56:41.372Z+00:00 [debug] envoy.upstream(23) re-creating local LB for TLS cluster original-destination
2024-09-29T11:56:41.372Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"46"] delayed connect error: 111
2024-09-29T11:56:41.372Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"46"] closing socket: 0
2024-09-29T11:56:41.372Z+00:00 [debug] envoy.pool(23) [Tags: "ConnectionId":"46"] client disconnected, failure reason: delayed connect error: 111
2024-09-29T11:56:41.372Z+00:00 [debug] envoy.filter(23) [Tags: "ConnectionId":"45"] Creating connection to cluster original-destination
2024-09-29T11:56:41.372Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"45"] closing data_to_write=0 type=1
2024-09-29T11:56:41.372Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"45"] closing socket: 1
2024-09-29T11:56:41.372Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"45"] adding to cleanup list
2024-09-29T11:56:41.372Z+00:00 [debug] envoy.pool(23) invoking 1 idle callback(s) - is_draining_for_deletion_=false
2024-09-29T11:56:41.373Z+00:00 [debug] envoy.upstream(24) membership update for TLS cluster original-destination added 1 removed 0
2024-09-29T11:56:41.373Z+00:00 [debug] envoy.upstream(24) re-creating local LB for TLS cluster original-destination

For error curl: (52) Empty reply from server I see the below log entries
Here it shows the IP 30.0.1.113:80 as the destination => this is the IP of nginx pod in K8S

2024-09-29T11:58:33.700Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=72
2024-09-29T11:58:33.700Z [DEBUG] consul-dataplane.dns-proxy.udp: dns messaged received from consul: length=96
2024-09-29T11:58:33.700Z+00:00 [debug] envoy.filter(24) original_dst: set destination to 30.0.1.113:80
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.filter(24) [Tags: "ConnectionId":"62"] new tcp proxy session
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.filter(24) [Tags: "ConnectionId":"62"] Creating connection to cluster original-destination
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.upstream(24) transport socket match, socket default selected for host with address 30.0.1.113:80
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.upstream(24) Created host original-destination30.0.1.113:80 30.0.1.113:80.
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.misc(24) Allocating TCP conn pool
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.pool(24) trying to create new connection
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.pool(24) creating a new connection (connecting=0)
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.upstream(13) addHost() adding original-destination30.0.1.113:80 30.0.1.113:80.
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"63"] connecting to 30.0.1.113:80
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.upstream(13) membership update for TLS cluster original-destination added 1 removed 0
2024-09-29T11:58:33.703Z+00:00 [debug] envoy.upstream(13) re-creating local LB for TLS cluster original-destination
2024-09-29T11:58:33.701Z+00:00 [debug] envoy.upstream(23) membership update for TLS cluster original-destination added 1 removed 0
2024-09-29T11:58:33.703Z+00:00 [debug] envoy.upstream(23) re-creating local LB for TLS cluster original-destination
2024-09-29T11:58:33.704Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"63"] connection in progress
2024-09-29T11:58:33.704Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"62"] new connection from 30.0.1.220:51388
2024-09-29T11:58:33.704Z+00:00 [debug] envoy.upstream(24) membership update for TLS cluster original-destination added 1 removed 0
2024-09-29T11:58:33.704Z+00:00 [debug] envoy.upstream(24) re-creating local LB for TLS cluster original-destination
2024-09-29T11:58:33.704Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"63"] connected
2024-09-29T11:58:33.704Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"63"] attaching to next stream
2024-09-29T11:58:33.704Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"63"] creating stream
2024-09-29T11:58:33.704Z+00:00 [debug] envoy.router(24) Attached upstream connection [C63] to downstream connection [C62]
2024-09-29T11:58:33.704Z+00:00 [debug] envoy.filter(24) [Tags: "ConnectionId":"62"] TCP:onUpstreamEvent(), requestedServerName: 
2024-09-29T11:58:33.707Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"62"] remote close
2024-09-29T11:58:33.707Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"62"] closing socket: 0
2024-09-29T11:58:33.707Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"63"] closing data_to_write=0 type=0
2024-09-29T11:58:33.707Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"63"] closing socket: 1
2024-09-29T11:58:33.707Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"63"] client disconnected, failure reason: 
2024-09-29T11:58:33.707Z+00:00 [debug] envoy.pool(24) invoking 1 idle callback(s) - is_draining_for_deletion_=false
2024-09-29T11:58:33.707Z+00:00 [debug] envoy.pool(24) [Tags: "ConnectionId":"63"] destroying stream: 0 remaining
2024-09-29T11:58:33.707Z+00:00 [debug] envoy.pool(24) invoking 0 idle callback(s) - is_draining_for_deletion_=false
2024-09-29T11:58:33.707Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"62"] adding to cleanup list
2024-09-29T11:58:35.574Z+00:00 [debug] envoy.main(13) flushing stats
2024-09-29T11:58:37.492Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"64"] new connection from 30.0.1.82:51924
2024-09-29T11:58:37.492Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"64"] closing socket: 0
2024-09-29T11:58:37.492Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"64"] adding to cleanup list
2024-09-29T11:58:40.577Z+00:00 [debug] envoy.main(13) flushing stats
2024-09-29T11:58:40.734Z+00:00 [debug] envoy.upstream(13) membership update for TLS cluster original-destination added 0 removed 1
2024-09-29T11:58:40.734Z+00:00 [debug] envoy.upstream(13) re-creating local LB for TLS cluster original-destination
2024-09-29T11:58:40.734Z+00:00 [debug] envoy.upstream(13) removing hosts for TLS cluster original-destination removed 1
2024-09-29T11:58:40.734Z+00:00 [debug] envoy.upstream(24) membership update for TLS cluster original-destination added 0 removed 1
2024-09-29T11:58:40.735Z+00:00 [debug] envoy.upstream(24) re-creating local LB for TLS cluster original-destination
2024-09-29T11:58:40.735Z+00:00 [debug] envoy.upstream(24) removing hosts for TLS cluster original-destination removed 1
2024-09-29T11:58:40.734Z+00:00 [debug] envoy.upstream(23) membership update for TLS cluster original-destination added 0 removed 1
2024-09-29T11:58:40.735Z+00:00 [debug] envoy.upstream(23) re-creating local LB for TLS cluster original-destination
2024-09-29T11:58:40.735Z+00:00 [debug] envoy.upstream(23) removing hosts for TLS cluster original-destination removed

Also, this time when I continuosly run k exec -it test-pod -c test-pod-container -- curl nginx-service.virtual.consul everytime it gave the output ONLY from k8s (Hello, I am running on Kubernetes!) without any intermittent failures like before. I think this is because in DIG results it only shows one IP (240.0.0.3)

k exec -it test-pod -c test-pod-container -- dig nginx-service.virtual.consul

; <<>> DiG 9.18.16 <<>> nginx-service.virtual.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22677
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nginx-service.virtual.consul.  IN      A

;; ANSWER SECTION:
nginx-service.virtual.consul. 0 IN      A       240.0.0.3

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)

But, it shows both K8s POD and Nomad Worker node IP in DIG for nginx-service.service.consul

k exec -it test-pod -c test-pod-container -- dig nginx-service.service.consul

; <<>> DiG 9.18.16 <<>> nginx-service.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17396
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nginx-service.service.consul.  IN      A

;; ANSWER SECTION:
nginx-service.service.consul. 0 IN      A       30.0.1.113
nginx-service.service.consul. 0 IN      A       192.168.40.11

;; Query time: 4 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)

I am still observing the below logs in k logs -f test-pod -c consul-dataplane
consul-dataplane.dns-proxy.udp: timeout waiting for read: error="read udp 127.0.0.1:8600: i/o timeout"
Also, I do not have any IP as 30.0.1.82:37592 in K8S although this says new connection from

2024-09-29T12:13:27.492Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"192"] new connection from 30.0.1.82:37592
2024-09-29T12:13:27.493Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"192"] closing socket: 0
2024-09-29T12:13:27.493Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"192"] adding to cleanup list
2024-09-29T12:13:28.151Z [DEBUG] consul-dataplane.dns-proxy.udp: timeout waiting for read: error="read udp 127.0.0.1:8600: i/o timeout"
2024-09-29T12:13:30.741Z+00:00 [debug] envoy.main(13) flushing stats
2024-09-29T12:13:35.742Z+00:00 [debug] envoy.main(13) flushing stats
2024-09-29T12:13:37.492Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"193"] new connection from 30.0.1.82:50364
2024-09-29T12:13:37.493Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"193"] closing socket: 0
2024-09-29T12:13:37.493Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"193"] adding to cleanup list
2024-09-29T12:13:38.157Z [DEBUG] consul-dataplane.dns-proxy.udp: timeout waiting for read: error="read udp 127.0.0.1:8600: i/o timeout"

My current k8s pods and services in all namespaces

k get pod,svc -o wide -A
NAMESPACE     NAME                                                      READY   STATUS    RESTARTS   AGE   IP              NODE                   NOMINATED NODE   READINESS GATES
consul        pod/consul-consul-connect-injector-7fb8dc89b-cmmzj        1/1     Running   0          13h   30.0.1.119      k8s-cluster3-worker1   <none>           <none>
consul        pod/consul-consul-sync-catalog-7cd6c65d76-mcfrz           1/1     Running   0          13h   30.0.1.236      k8s-cluster3-worker1   <none>           <none>
consul        pod/consul-consul-webhook-cert-manager-5d48957cc8-xjjmt   1/1     Running   0          13h   30.0.1.228      k8s-cluster3-worker1   <none>           <none>
default       pod/k8s-nginx-c85c587cd-cwgkw                             2/2     Running   0          57m   30.0.1.113      k8s-cluster3-worker1   <none>           <none>
default       pod/test-pod                                              2/2     Running   0          36m   30.0.1.220      k8s-cluster3-worker1   <none>           <none>
default       pod/x-wing-6bb767fcb8-tbqft                               1/1     Running   0          15d   30.0.1.183      k8s-cluster3-worker1   <none>           <none>
kube-system   pod/cilium-4bn68                                          1/1     Running   0          43d   192.168.50.11   k8s-cluster3-worker1   <none>           <none>
kube-system   pod/cilium-lb98l                                          1/1     Running   0          43d   192.168.50.10   k8s-cluster3-master    <none>           <none>
kube-system   pod/cilium-operator-6df6cdb59b-6fht4                      1/1     Running   0          43d   192.168.50.11   k8s-cluster3-worker1   <none>           <none>
kube-system   pod/cilium-operator-6df6cdb59b-dfhjs                      1/1     Running   0          43d   192.168.50.10   k8s-cluster3-master    <none>           <none>
kube-system   pod/clustermesh-apiserver-54ff876d79-mfz2q                2/2     Running   0          43d   30.0.1.240      k8s-cluster3-worker1   <none>           <none>
kube-system   pod/coredns-79494bc9b7-bkhqj                              1/1     Running   0          44m   30.0.0.136      k8s-cluster3-master    <none>           <none>
kube-system   pod/coredns-79494bc9b7-nw48g                              1/1     Running   0          44m   30.0.0.121      k8s-cluster3-master    <none>           <none>
kube-system   pod/etcd-k8s-cluster3-master                              1/1     Running   0          43d   192.168.50.10   k8s-cluster3-master    <none>           <none>
kube-system   pod/hubble-relay-6446b75c8c-c4wn4                         1/1     Running   0          43d   30.0.1.164      k8s-cluster3-worker1   <none>           <none>
kube-system   pod/hubble-ui-5dc9c647db-n92k2                            2/2     Running   0          43d   30.0.1.203      k8s-cluster3-worker1   <none>           <none>
kube-system   pod/kube-apiserver-k8s-cluster3-master                    1/1     Running   0          43d   192.168.50.10   k8s-cluster3-master    <none>           <none>
kube-system   pod/kube-controller-manager-k8s-cluster3-master           1/1     Running   0          43d   192.168.50.10   k8s-cluster3-master    <none>           <none>
kube-system   pod/kube-proxy-lglf9                                      1/1     Running   0          43d   192.168.50.11   k8s-cluster3-worker1   <none>           <none>
kube-system   pod/kube-proxy-n7bwt                                      1/1     Running   0          43d   192.168.50.10   k8s-cluster3-master    <none>           <none>
kube-system   pod/kube-scheduler-k8s-cluster3-master                    1/1     Running   0          43d   192.168.50.10   k8s-cluster3-master    <none>           <none>

NAMESPACE     NAME                                     TYPE           CLUSTER-IP       EXTERNAL-IP                                     PORT(S)                  AGE     SELECTOR
consul        service/consul                           ExternalName   <none>           consul.service.consul                           <none>                   20h     <none>
consul        service/consul-consul-connect-injector   ClusterIP      10.99.191.129    <none>                                          443/TCP                  15d     app=consul,component=connect-injector,release=consul
consul        service/consul-consul-dns                ClusterIP      10.97.111.170    <none>                                          53/TCP,53/UDP            15d     app=consul,hasDNS=true,release=consul
consul        service/nginx-service                    ExternalName   <none>           nginx-service.service.consul                    <none>                   20h     <none>
consul        service/nginx-service-sidecar-proxy      ExternalName   <none>           nginx-service-sidecar-proxy.service.consul      <none>                   20h     <none>
consul        service/nomad-client                     ExternalName   <none>           nomad-client.service.consul                     <none>                   17h     <none>
consul        service/test-pod-service                 ExternalName   <none>           test-pod-service.service.consul                 <none>                   3h14m   <none>
consul        service/test-pod-service-sidecar-proxy   ExternalName   <none>           test-pod-service-sidecar-proxy.service.consul   <none>                   3h14m   <none>
default       service/kubernetes                       ClusterIP      10.96.0.1        <none>                                          443/TCP                  43d     <none>
default       service/nginx-service                    ClusterIP      10.108.130.247   <none>                                          80/TCP                   20h     app=k8s-nginx
default       service/test-pod-service                 ClusterIP      10.109.227.37    <none>                                          80/TCP                   3h14m   app=test-pod
kube-system   service/clustermesh-apiserver            NodePort       10.97.94.194     <none>                                          2379:32379/TCP           43d     k8s-app=clustermesh-apiserver
kube-system   service/clustermesh-apiserver-metrics    ClusterIP      None             <none>                                          9962/TCP,9963/TCP        43d     k8s-app=clustermesh-apiserver
kube-system   service/hubble-peer                      ClusterIP      10.110.244.43    <none>                                          443/TCP                  43d     k8s-app=cilium
kube-system   service/hubble-relay                     ClusterIP      10.109.4.45      <none>                                          80/TCP                   43d     k8s-app=hubble-relay
kube-system   service/hubble-ui                        ClusterIP      10.110.74.196    <none>                                          80/TCP                   43d     k8s-app=hubble-ui
kube-system   service/kube-dns                         ClusterIP      10.96.0.10       <none>                                          53/UDP,53/TCP,9153/TCP   43d     k8s-app=kube-dns

Accordingly as per above content it shows the external IP as nginx-service.service.consul for the service/nginx-service . However, I am unable to access it due to the above given errors.

Thank you!

Ranjandas · September 30, 2024, 12:39am

Hi @harsh.lif3,

What Kubernetes distro are you using? I can see that you are running Cilium for the network.

One thing to note is you should be using either when curl’ing:

Kubernetes service name (nginx-service)
Virtual service name (nginx-service.virtual.consul)

To understand why the nomad service is not being hit, could you please run the following from your test pod and share the output here? This will show us whether the nginx instance in Nomad has been added to Envoy or not.

curl 0:19000/clusters | grep hostname

harsh.lif3 · September 30, 2024, 8:10am

Dear @Ranjandas,

I have installed vanilla k8s through kubeadm. (Installing kubeadm | Kubernetes). I haven’t used any distros like rancher, kind, etc.

I am running k8s version 1.30

kubectl version

Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.4

kubectl get nodes -o wide
NAME                   STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
k8s-cluster3-master    Ready    control-plane   44d   v1.30.4   192.168.50.10   <none>        Ubuntu 22.04.4 LTS   5.15.0-118-generic   docker://27.1.2
k8s-cluster3-worker1   Ready    <none>          44d   v1.30.4   192.168.50.11   <none>        Ubuntu 22.04.4 LTS   5.15.0-118-generic   docker://27.1.2

The output of the command:

k exec -it test-pod -c test-pod-container -- curl 0:19000/clusters | grep hostname
local_app::127.0.0.1:80::hostname::
consul-dataplane::127.0.0.1:35943::hostname::
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::hostname::

Here I do not see nomad-client.default.dc1.xxxx entries though. Does that mean that Nomad is having some configuration issue However, when I run the command without grep hostname it shows some nomad-client. entries as below.

The output for the command without grep hostname

k exec -it test-pod -c test-pod-container -- curl 0:19000/clusters
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::observability_name::nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::outlier::success_rate_average::-1
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::outlier::success_rate_ejection_threshold::-1
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::outlier::local_origin_success_rate_average::-1
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::outlier::local_origin_success_rate_ejection_threshold::-1
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::default_priority::max_connections::1024
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::default_priority::max_pending_requests::1024
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::default_priority::max_requests::1024
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::default_priority::max_retries::3
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::high_priority::max_connections::1024
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::high_priority::max_pending_requests::1024
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::high_priority::max_requests::1024
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::high_priority::max_retries::3
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::added_via_api::true
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::cx_active::2
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::cx_connect_fail::0
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::cx_total::6
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::rq_active::0
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::rq_error::0
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::rq_success::304
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::rq_timeout::0
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::rq_total::304
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::hostname::
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::health_flags::healthy
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::weight::1
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::region::
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::zone::
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::sub_zone::
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::canary::false
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::priority::0
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::success_rate::-1
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::local_origin_success_rate::-1
consul-dataplane::observability_name::consul-dataplane
consul-dataplane::default_priority::max_connections::1024
consul-dataplane::default_priority::max_pending_requests::1024
consul-dataplane::default_priority::max_requests::1024
consul-dataplane::default_priority::max_retries::3
consul-dataplane::high_priority::max_connections::1024
consul-dataplane::high_priority::max_pending_requests::1024
consul-dataplane::high_priority::max_requests::1024
consul-dataplane::high_priority::max_retries::3
consul-dataplane::added_via_api::false
consul-dataplane::127.0.0.1:35943::cx_active::1
consul-dataplane::127.0.0.1:35943::cx_connect_fail::0
consul-dataplane::127.0.0.1:35943::cx_total::1
consul-dataplane::127.0.0.1:35943::rq_active::1
consul-dataplane::127.0.0.1:35943::rq_error::0
consul-dataplane::127.0.0.1:35943::rq_success::0
consul-dataplane::127.0.0.1:35943::rq_timeout::0
consul-dataplane::127.0.0.1:35943::rq_total::1
consul-dataplane::127.0.0.1:35943::hostname::
consul-dataplane::127.0.0.1:35943::health_flags::healthy
consul-dataplane::127.0.0.1:35943::weight::1
consul-dataplane::127.0.0.1:35943::region::
consul-dataplane::127.0.0.1:35943::zone::
consul-dataplane::127.0.0.1:35943::sub_zone::
consul-dataplane::127.0.0.1:35943::canary::false
consul-dataplane::127.0.0.1:35943::priority::0
consul-dataplane::127.0.0.1:35943::success_rate::-1
consul-dataplane::127.0.0.1:35943::local_origin_success_rate::-1
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::observability_name::nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::outlier::success_rate_average::-1
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::outlier::success_rate_ejection_threshold::-1
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::outlier::local_origin_success_rate_average::-1
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::outlier::local_origin_success_rate_ejection_threshold::-1
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::default_priority::max_connections::1024
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::default_priority::max_pending_requests::1024
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::default_priority::max_requests::1024
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::default_priority::max_retries::3
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::high_priority::max_connections::1024
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::high_priority::max_pending_requests::1024
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::high_priority::max_requests::1024
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::high_priority::max_retries::3
nomad-client.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::added_via_api::true
local_app::observability_name::local_app
local_app::default_priority::max_connections::1024
local_app::default_priority::max_pending_requests::1024
local_app::default_priority::max_requests::1024
local_app::default_priority::max_retries::3
local_app::high_priority::max_connections::1024
local_app::high_priority::max_pending_requests::1024
local_app::high_priority::max_requests::1024
local_app::high_priority::max_retries::3
local_app::added_via_api::true
local_app::127.0.0.1:80::cx_active::0
local_app::127.0.0.1:80::cx_connect_fail::0
local_app::127.0.0.1:80::cx_total::0
local_app::127.0.0.1:80::rq_active::0
local_app::127.0.0.1:80::rq_error::0
local_app::127.0.0.1:80::rq_success::0
local_app::127.0.0.1:80::rq_timeout::0
local_app::127.0.0.1:80::rq_total::0
local_app::127.0.0.1:80::hostname::
local_app::127.0.0.1:80::health_flags::healthy
local_app::127.0.0.1:80::weight::1
local_app::127.0.0.1:80::region::
local_app::127.0.0.1:80::zone::
local_app::127.0.0.1:80::sub_zone::
local_app::127.0.0.1:80::canary::false
local_app::127.0.0.1:80::priority::0
local_app::127.0.0.1:80::success_rate::-1
local_app::127.0.0.1:80::local_origin_success_rate::-1
original-destination::observability_name::original-destination
original-destination::default_priority::max_connections::1024
original-destination::default_priority::max_pending_requests::1024
original-destination::default_priority::max_requests::1024
original-destination::default_priority::max_retries::3
original-destination::high_priority::max_connections::1024
original-destination::high_priority::max_pending_requests::1024
original-destination::high_priority::max_requests::1024
original-destination::high_priority::max_retries::3
original-destination::added_via_api::true

Consul UI

Nomad UI: Nginx-task details

Nomad Server Node Config: (/etc/nomad.d./nomad.hcl)

data_dir  = "/opt/nomad/data"
bind_addr = "0.0.0.0"

server {
  enabled          = true
  bootstrap_expect = 1  
}

advertise {
 http = "192.168.40.10:4646"
 rpc = "192.168.40.10:4647"
 serf = "192.168.40.10:4648"
}

client {
  enabled = false  # Disable the client on the server
}

#consul {
 #address = "192.168.60.10:8500"
 #checks_use_advertise = true
#}

Nomad client node config: (/etc/nomad.d./nomad.hcl)

client {
  enabled = true
  servers = ["192.168.40.10:4647"]
  
 # host_volume "nomad-vol1" {
  #  path = "/home/ubuntu/nomad/nomad-vol1/"
  #  read_only = false
 # }

  options {
    "docker.privileged.enabled" = "true"
    "docker.caps.whitelist" = "NET_RAW,NET_ADMIN"
  }
}

plugin "docker" {
 config {
   allow_privileged = true
   volumes {
     enabled = true
   }
 }
}

data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"

advertise {
  http = "192.168.40.11:4646"
}

server {
  enabled = false  # Disable server functionality on the client node
}

consul {
 address = "127.0.0.1:8500"
 checks_use_advertise = true
}

Also, I have installed Consul ONLY on Nomad Client Node and configured it as a Consul Client.

Consul Config: (/etc/consul.d./consul.hcl)

data_dir = "/opt/consul"
client_addr = "0.0.0.0"
bind_addr = "192.168.40.11"
server = false
advertise_addr = "192.168.40.11"
retry_join = ["192.168.60.10"]
log_level = "DEBUG"

The external VM running the Consul server on IP 192.168.60.10 has the below config in /etc/consul.d/consul.hcl

data_dir = "/opt/consul"
bind_addr = "192.168.60.10"
client_addr = "0.0.0.0" #Allow connections from any client

ui_config{
  enabled = true
}

server = true
advertise_addr = "192.168.60.10"
bootstrap_expect=1
retry_join = ["192.168.60.10"]

ports {
 grpc = 8502 
}

connect {
 enabled = true
}

#log_level = "DEBUG"

config_entries {
 bootstrap = [
  {
    Kind = "proxy-defaults"
    Name = "global"
    AccessLogs {
      Enabled = true
      #Type = "stderr"
    }
    Config {
      protocol = "http"
    }
  }
 ]
}

Additionally, I did the following on the Nomad Client Node.

Enabled DNS forwarding through systemd-resolved. DNS forwarding | Consul | HashiCorp Developer

After adding this I could run dig consul.service.consul from Nomad Client node.

; <<>> DiG 9.18.28-0ubuntu0.22.04.1-Ubuntu <<>> consul.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4693
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;consul.service.consul.         IN      A

;; ANSWER SECTION:
consul.service.consul.  0       IN      A       192.168.60.10

;; Query time: 4 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)

Then I also tried to run below commands from Nomad Client node and got below outputs;

dig nginx-service.virtual.consul

; <<>> DiG 9.18.28-0ubuntu0.22.04.1-Ubuntu <<>> nginx-service.virtual.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12377
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;nginx-service.virtual.consul.  IN      A

;; ANSWER SECTION:
nginx-service.virtual.consul. 0 IN      A       240.0.0.3

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)

curl nginx-service.virtual.consul
curl: (7) Failed to connect to nginx-service.virtual.consul port 80 after 2 ms: No route to host

consul members
Node              Address             Status  Type    Build   Protocol  DC   Partition  Segment
consul-server     192.168.60.10:8301  alive   server  1.19.2  2         dc1  default    <all>
nomad-client-new  192.168.40.11:8301  alive   client  1.19.2  2         dc1  default    <default>

My Nomad Job Specification

job "nginx" {
  datacenters = ["dc1"] # Specify your datacenter
  type        = "service"

  group "nginx" {
    count = 1  # Number of instances

    network {
      mode = "bridge" # This uses Docker bridge networking
      port "http" {
        to = 80 
      }
    }

    task "nginx" {
      driver = "docker"

      config {
        image = "nginx:alpine"

        # Entry point to write message into index.html and start nginx
        entrypoint = [
          "/bin/sh", "-c",
          "echo 'Hello, I am running on Nomad!' > /usr/share/nginx/html/index.html && nginx -g 'daemon off;'"
        ]
      }

      resources {
        cpu    = 500    # CPU units
        memory = 256    # Memory in MB
      }

      service {
        name = "nginx-service"
        port = "http"  # Reference the network port defined above
        tags = ["nginx", "nomad"]

        connect {
          sidecar_service {
            proxy {
              transparent_proxy {}
            }
          }
        }

        check {
          type     = "http"
          path     = "/"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

Nomad Client Nginx Allocation Status

nomad alloc status 088c9bf2
ID                  = 088c9bf2-e3aa-81dd-e2a1-46762330642a
Eval ID             = 38575a6a
Name                = nginx.nginx[0]
Node ID             = 9c41d5dc
Node Name           = nomad-client-new
Job ID              = nginx
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 18h49m ago
Modified            = 18h48m ago
Deployment ID       = 784320e4
Deployment Health   = healthy

Allocation Addresses (mode = "bridge"):
Label  Dynamic  Address
*http  yes      192.168.40.11:26734 -> 80

Task "nginx" is "running"
Task Resources:
CPU        Memory          Disk     Addresses
0/500 MHz  12 MiB/256 MiB  300 MiB  

Task Events:
Started At     = 2024-09-29T18:56:42Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2024-09-29T18:56:42Z  Started     Task started by client
2024-09-29T18:56:37Z  Driver      Downloading image
2024-09-29T18:56:37Z  Task Setup  Building Task Directory
2024-09-29T18:56:36Z  Received    Task received by client

Moreover, I have the following connectivity from a networking standpoint.

From Kubernetes node VMs (master & worker) can ping to Nomad node VMs (master & client) & vice versa.
From Kubernetes pods can ping to both Nomad Node’s VM IPs.
From Nomad Node VMs (master & client) can ping directly to the kubernetes pod IPs.
Can ping to k8s pod IPs directly from inside Nomad Task allocation

nomad alloc exec -i -t 088c9bf2 sh
/ # ping 30.0.1.113
PING 30.0.1.113 (30.0.1.113): 56 data bytes
64 bytes from 30.0.1.113: seq=0 ttl=42 time=0.579 ms

Thank you!

harsh.lif3 · October 1, 2024, 11:44am

Dear @Ranjandas,

I am so thankful for your kind advices and instructions once again.

I found that although my Nomad Nginx Service shows under the nginx-service since I used the same service name as in K8s it is not part of the Envoy.

k exec -it test-pod -c test-pod-container -- curl 0:19000/clusters | grep hostname
local_app::127.0.0.1:80::hostname::
consul-dataplane::127.0.0.1:35943::hostname::
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::hostname::

Then as your instructions, I added the grpc port to the Consul Client config in Nomad Client Node. Also, corrected the Nomad Nginx job spec to have the Service block inside the Group block.

Then I could run the Nginx job in Nomad.

Also, now I can see the result of k exec -it test-pod -c test-pod-container -- curl 0:19000/clusters | grep hostname command as below;

k exec -it test-pod -c test-pod-container -- curl 0:19000/clusters | grep hostname
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::30.0.1.113:20000::hostname::
nginx-service.default.dc1.internal.1734b89d-6c9d-6e59-d27c-a722a90084da.consul::192.168.40.11:29462::hostname::
consul-dataplane::127.0.0.1:35943::hostname::
local_app::127.0.0.1:80::hostname::

This shows the Nomad connect-proxy-nginx-service address along with K8S pod IP with ProxyInboundPort":20000.

So now both nginx instances from K8S and Nomad are added to Envoy.

I executed the following commands and got some results;

From K8S Pod

k exec -it test-pod -c test-pod-container -- dig nginx-service.virtual.consul

; <<>> DiG 9.18.16 <<>> nginx-service.virtual.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35130
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nginx-service.virtual.consul.  IN      A

;; ANSWER SECTION:
nginx-service.virtual.consul. 0 IN      A       240.0.0.3

;; Query time: 4 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)

From K8S Pod

k exec -it test-pod -c test-pod-container -- curl nginx-service.virtual.consul
Hello, I am running on Kubernetes!
ubuntu@ubuntu-desktop:~$ k exec -it test-pod -c test-pod-container -- curl nginx-service.virtual.consul
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 111

From Nomad Client Node

 dig nginx-service.virtual.consul

; <<>> DiG 9.18.28-0ubuntu0.22.04.1-Ubuntu <<>> nginx-service.virtual.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15619
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;nginx-service.virtual.consul.  IN      A

;; ANSWER SECTION:
nginx-service.virtual.consul. 0 IN      A       240.0.0.3

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)

I have only added DNS forwarding to Nomad Client node because the workload is running there. So if I run this dig command in Nomad server node it will not succeed.

From Nomad Client Node

curl nginx-service.virtual.consul
curl: (7) Failed to connect to nginx-service.virtual.consul port 80 after 2 ms: No route to host

However, I also executed below commands inside another task in Nomad (Not from the Nomad client Node, but from inside a task running there)

nomad alloc exec -i -t -task test-pod 858b48c3 sh
/ # curl nginx-service.virtual.consul
Hello, I am running on Kubernetes!
/ # curl nginx-service.virtual.consul
/ # curl nginx-service.virtual.consul
Hello, I am running on Kubernetes!
/ # curl nginx-service.virtual.consul
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 111/

In both cases, I got the response from K8S, but not from Nomad nginx instance.

Also I am still getting the below error continuously from consul-dataplane in test-pod

k logs test-pod -c consul-dataplane
[debug] envoy.conn_handler(23) [Tags: "ConnectionId":"17875"] new connection from 30.0.1.82:33886
2024-10-01T12:12:17.492Z+00:00 [debug] envoy.connection(23) [Tags: "ConnectionId":"17875"] closing socket: 0
2024-10-01T12:12:17.493Z+00:00 [debug] envoy.conn_handler(23) [Tags: "ConnectionId":"17875"] adding to cleanup list
2024-10-01T12:12:17.612Z+00:00 [debug] envoy.main(13) flushing stats
2024-10-01T12:12:17.665Z [DEBUG] consul-dataplane.dns-proxy.udp: timeout waiting for read: error="read udp 127.0.0.1:8600: i/o timeout"
2024-10-01T12:12:22.613Z+00:00 [debug] envoy.main(13) flushing stats

Thank you!

Ranjandas · October 1, 2024, 12:17pm

Great! I think you are close enough to getting it work end to end. The error with the nomad allocation is probably an issue with the port forwarding not setup properly in the nomad job. Can you post the full nomad job spec for nginx. Also note that you won’t be able to access mesh services from the host due to the mTLS requirements. Requests coming into the mesh will have to be served through a loadbalancer or gateway that integrates with mesh (eg: Consul API Gateway)

harsh.lif3 · October 1, 2024, 12:21pm

Dear @Ranjandas,

Below is my full job spec for Nomad Nginx

job "nginx" {
  datacenters = ["dc1"] # Specify your datacenter
  type        = "service"

  group "nginx" {
    count = 1  # Number of instances

    network {
      mode = "bridge" # This uses Docker bridge networking
      port "http" {
        to = 80
      }
    }

    service {
      name = "nginx-service"
      port = "http"  # Reference the network port defined above
      tags = ["nginx", "nomad"]

      connect {
        sidecar_service {
          proxy {
            transparent_proxy {}
          }
        }
      }
    }

    task "nginx" {
      driver = "docker"

      config {
        image = "nginx:alpine"

        # Entry point to write message into index.html and start nginx
        entrypoint = [
          "/bin/sh", "-c",
          "echo 'Hello, I am running on Nomad!' > /usr/share/nginx/html/index.html && nginx -g 'daemon off;'"
        ]
      }

      resources {
        cpu    = 500    # CPU units
        memory = 256    # Memory in MB
      }
    }
  }
}

Thank you!

Ranjandas · October 1, 2024, 10:53pm

Hi @harsh.lif3,

Remove the named port from the group.service.port and hardcode it as port 80.

Ref: Consul Service Mesh | Nomad | HashiCorp Developer.

This should get everything working for you.

harsh.lif3 · October 2, 2024, 11:39pm

Dear @Ranjandas,

Thank you so much for your kind assistance. Once the port is hard coded it worked!

Finally, I would like to summarize my understandings and the things I did. I kindly seek your inputs and advices to make sure I am on the correct track.

Consul Server (/etc/consul.d/consul.hcl)

bind_addr = "192.168.60.10" => tells Consul agents which network interface to use for talking to other Consul agents (servers or clients) internally. This will always be an internal private IP and only other agents on the same network can communicate with this agent. So, here I entered the local IP of the Consul server itself.
client_addr = "0.0.0.0" => defines which network interfaces can be used by external applications or clients to talk to Consul. If this is set to 127.0.0.1 no external clients (in my case K8S pods and Nomad tasks) will be able to access this server. So I added 0.0.0.0 to enable the accessibility from them.
advertise_addr = "192.168.60.10" => IP address that Consul tells other agents to use when they want to connect to this agent. I think the difference here with the bind_addr is here we can use a public IP if required. In my case I used the same local IP because the nodes can talk with each other inside the VM.
connect { enabled = true } => enabled to allow service mesh.
retry_join = ["192.168.60.10"] => Although I enabled this I am not sure whether it has any impact on the server itself.

Nomad Client node’s Consul config in /etc/consul.d/consul.hcl

client_addr = "0.0.0.0"
bind_addr = "192.168.40.11"
advertise_addr = "192.168.40.11"
retry_join = ["192.168.60.10"]

Although I have defined the above values I am not sure whether advertise_addr is required for the client except bind_addr, retry_join and client_addr.

Nomad Client Node’s Consul Block in /etc/nomad.d/nomad.hcl

consul {
 address = "127.0.0.1:8500"
 checks_use_advertise = true
}

I have address as 127.0.0.1:8500 because I read that Nomad agents should be configured to talk to Consul agents and not Consul servers. Enabled checks_use_advertise for Consul health checks to use the bind_addr.

I do not have connect { enabled = true } in the Consul Client config in Nomad Client Node. I think we can keep it only in the Consul Server config.

Earlier I have enabled DNS forwarding for Nomad client node, but I removed it later since the Nomad tasks could connect to *.virtual.consul without any issue.

Consul values in Kubernetes Cluster

httpsPort: 8500 => I added port 8500 here because I am using the default HTTP Port. I do not have any TLS or HTTPS ports. Also, if I do not add this it gave me errors such as {"error": "Get \"http://192.168.60.10:8501/v1/config/file-system-certificate?dc=dc1\": context canceled"} and the consul-connect-injector pod fails.
k8sAuthMethodHost: "https://192.168.50.10:6443" => Although I have used this I think this is not applicable in my scenario.

syncCatalog:
  enabled: true
  #toConsul: false
  #toK8S: false
  default: false

I enabled this since then I could add service-sync=true annotation to the K8S services. However, I haven’t enabled that annotation for the nginx-service because service gets automatically added when the connect-inject=true annotation is added to the pod/deployment. I think we only need service-sync=true when we only want to expose the service to non-k8s apps. But, this is still not clear for me. I also commented out toConsul and toK8S since I wasn’t 100% sure what happens there. Service Sync for Consul on Kubernetes | Consul | HashiCorp Developer

dns:
  enabled: true
  enableRedirection: true

I added DNS block with these 02 params for services using Consul service mesh to use Consul DNS for default DNS resolution.

I also added below code block to CoreDNS configmap

consul:53 {
    errors
    cache 30
    forward . <cluster ip address of consul-dns service in k8s>
}

I am using Transparent proxy mode in Kubernetes since it is enabled by default when Consul is installed on k8s using the Consul Helm chart. I think it is a side car proxy. As I am using transparent proxy in k8s, I also used the transparent_proxy {} in Nomad Job spec connect.sidecar_service.proxy hoping both should match . However, I didn’t use any upstream inside the proxy. I think it is because in my case I do not have services (microservices) which requires to talk to each other. If I was using such application I should insert upstream inside transparent proxy block.

Under upstream section in Consul UI it shows

I also found that using Consul’s built-in proxy service is not recommended. Hence, I believe it is best to use transparent proxy.

Also, I do not understand why we need to allow connectivity between k8s pod and nomad nodes and vice versa. I guess it is because although we are using DNS it only resolves the dns name of a service, but after resolving both ends should have connectivity to send packets.

Also I noticed that in k8s nginx-service.service.consul is listed under services (k get svc) as an external IP. But, since we are using transparent proxy we have to use *.virtual.consul and not service.consul. At what kind of situations could we use service.consul

Finally, I still see the error consul-dataplane.dns-proxy.udp: timeout waiting for read: error="read udp 127.0.0.1:8600: i/o timeout" on the consul-dataplane container inside k8s pod. I still couldn’t find from where this IP 30.0.1.82 comes.

k logs -f pod/test-pod -c consul-dataplane
[debug] envoy.conn_handler(24) [Tags: "ConnectionId":"8184"] new connection from 30.0.1.82:53654
2024-10-02T22:58:47.861Z+00:00 [debug] envoy.connection(24) [Tags: "ConnectionId":"8184"] closing socket: 0
2024-10-02T22:58:47.861Z+00:00 [debug] envoy.conn_handler(24) [Tags: "ConnectionId":"8184"] adding to cleanup list
2024-10-02T22:58:52.385Z+00:00 [debug] envoy.main(14) flushing stats
2024-10-02T22:58:54.861Z [DEBUG] consul-dataplane.dns-proxy.udp: timeout waiting for read: error="read udp 127.0.0.1:8600: i/o timeout"
2024-10-02T22:58:57.385Z+00:00 [debug] envoy.main(14) flushing stats

k exec -it test-pod -c test-pod-container -- nslookup kubernetes.default
;; Got recursion not available from 127.0.0.1, trying next server
;; Got recursion not available from 127.0.0.1, trying next server
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1
;; Got recursion not available from 127.0.0.1, trying next server

I sincerely thankful for your kind advices and guidance. It is highly valuable for a newbie like me to have this working. Thanks again!

system · November 1, 2024, 11:40pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Single Consul Datacenter in Multiple Kubernetes Clusters Connection Failure Consul connect , first-time-question , service-mesh	3	735	April 28, 2022
K8S Consul Connect to Nomad Consul Connect - Connectivity issues Consul connect , consul-k8s , nomad	3	348	July 25, 2023
Multi-datacenter consul with kubernetes Consul k8s	7	777	March 29, 2020
Consul DCs federation Consul k8s	1	749	November 13, 2020
Consul on K8S, connecting an external consul client via NodePort Consul k8s , consul	1	928	January 28, 2023

Connecting K8s and Nomad using a single Consul Server (DC1). Is this even possible or what is the next best way to do so?

Related topics