No healthy host for TCP connection pool" - Nomad and Consul Connect

harsh.arya · May 10, 2022, 10:31am

We are deploying tcp services in our consul service mesh. The services do not have a health check endpoint.
Service A server is listening on a port p1. Service A is healthy and passing health check in consul as well.
Service B client is trying to connect to service A.
Service A with port p1 is configured as upstream in sidecar_service for service B.
However, I am unable to connect to service A via service B over proxy (consul connect). I have enabled envoy debug log and able to see below debug logs:

[2022-05-10 09:37:27.101][17][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:337] [C220] Creating connection to cluster service-A.default.ap-south-1.internal.e079d7e9-bce7-04f4-d92e-13045ba5dc92.consul
[2022-05-10 09:37:27.101][17][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1599] no healthy host for TCP connection pool

I am able to connect to service A from service B if I hard-code the service A address in service B directly without proxy.

My Jobs which are running

Service A:

job "service-A-sandbox" {
  datacenters = [
    "ap-south-1a",
    "ap-south-1b",
    "ap-south-1c"
  ]
  type = "service"
  group "service-A-sandbox" {
    count = 1
    network {
      mode = "bridge"
      port "p1" {
        to = 9001
      }
    }
    service {
      name = "service-A-sandbox"
      port = "p1"
      connect {
        sidecar_service {
          tags = [
          ]
        }
      }
     #Dummy Health check
      check {
        name = "connect-proxy-service-A-health"
        type = "script"
        task = "service-A-sandbox"
        command = "/bin/sh"
        args = ["-c", "ls && exit 0; exit 1"]
        interval = "60s"
        timeout = "5s"
      }
    }
    task "service-A-sandbox" {
      driver = "docker"
      config {
        image      = "https://ghcr.io/github-repo/service-A:Dockerfile"
        force_pull = true
      }
      resources {
        cpu    = 300
        memory = 512
      }
    }
  }
}

Service B:

job "service-B" {
  datacenters = [
    "ap-south-1a",
    "ap-south-1b",
    "ap-south-1c"
  ]
  type = "service"
  group "service-B" {
    count = 1
    network {
      mode = "bridge"
      port "p2" {
        to = 3125
      }
    }
    service {
      name = "service-B"
      port = "p2"
      connect {
        sidecar_service {
          tags = [
          ]
          proxy {
            upstreams {
              destination_name = "service-A"
              local_bind_port = 9001
            }
          }
        }
      }

      #Dummy Health check
      check {
        name = "service-B-health"
        type = "script"
        task = "service-B"
        command = "ls"
        interval = "5s"
        timeout = "3s"
      }
    }
    task "service-B" {
      driver = "docker"
      config {
        image      = "https://ghcr.io/github-repo/service-B:Dockerfile"
        force_pull = true
      }
      resources {
        cpu    = 300
        memory = 512
      }
    }
  }
}

Adding TCP communication b/w the servers for log message:

[2022-05-10 09:37:27.101][17][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:337] [C220] Creating connection to cluster service-A.default.ap-south-1.internal.e079d7e9-bce7-04f4-d92e-13045ba5dc92.consul
[2022-05-10 09:37:27.101][17][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1599] no healthy host for TCP connection pool

Logs at service B:

<log realm="service-A-channel/127.0.0.1:9001">
  <connect>
    Try 0 localhost:9001
  </connect>
</log>
<log realm="service-A-channel/127.0.0.1:9001">
  <receive>
    <peer-disconnect/>
  </receive>
</log>
<log realm="service-A-channel">
  <warn>
    channel-receiver-service-A-receive
    Read timeout / EOF - reconnecting
  </warn>
</log>

seth.hoenig · May 24, 2022, 4:46pm

Hi @harsh.arya, can you confirm which versions of Consul and Nomad you are using? And what the TLS configuration (if any) for Consul looks like?

harsh.arya · May 25, 2022, 4:19am

Hi @seth.hoenig, I am using using below versions:

Nomad Version: 1.2.3
Consul Version: 1.11.1

Currently, we have not configured TLS for this environment.

Topic		Replies	Views
"No healthy host for TCP connection pool" - Nomad and Consul Connect Nomad connect	12	2083	February 11, 2021
“No healthy host for TCP connection pool” - Envoy and Consul Consul	2	1600	August 4, 2021
Cannot connect to a service in peered datacenter Consul connect	1	290	December 11, 2022
Problems with Consul Connect + Mesh Gateways Consul connect	5	1701	December 9, 2021
Prevent port exposure with Nomad + Connect + Docker Nomad connect , health-check	0	401	August 12, 2023

No healthy host for TCP connection pool" - Nomad and Consul Connect

Related topics