Consul Connect Connection Reset By Peer Errors

dlightsey · September 28, 2020, 4:41pm

Curious if anyone else has this working properly or not. I’m testing a very simple setup with Consul Connect and Nomad integration. I’ve got a simple redis job with a sidecar proxy enabled, and another job running an ubuntu container that has the redis service as an upstream connection. The 2 jobs are up and running fine, but when I try to execute a “redis-cli -h 127.0.0.1 -p 9595 (local bind port) ping”, I get a connection reset by peer and don’t get the PONG reply from redis. I’ve also tried connecting to a Postgresql upstream as well, but when I execute a “psql -h 127.0.0.1 -p 9596 -U userA” I get a connection reset error as well.

shoenig · September 29, 2020, 2:39pm

Hi @dlightsey sorry you’re having trouble getting this working. Can you post your job files and Nomad & Consul config files? If you try running the countdash example, does that work?

dlightsey · September 29, 2020, 9:14pm

Hi, thanks for the reply. Yes, I started with the countdash example and have that working perfectly. It’s only when I tried setting up a redis and postgresql that I started having this connection reset issue. My infra setup is 6 VM’s, 3 acting as servers and 3 as clients. Both node types have nomad and consul running in server and client respectively. I’ve uploaded my configs.

netshoot.txt (1.6 KB) redis.txt (1.6 KB) consul-client.txt (717 Bytes) nomad-client.txt (1.0 KB) consul-server.txt (722 Bytes) nomad-server.txt (762 Bytes)

shoenig · October 5, 2020, 3:40pm

I’m guessing you see the same problem either way, but FYI Consul does not yet support envoy v1.15 as a Connect proxy - that’s coming in Consul 1.9.

I created this example job file that I think can be used as a starting point for what you’re trying to do. With this I can exec into the “wait” task and contact redis through the Connect plumbing.

job "rediscon" {
  datacenters = ["dc1"]

  group "cache" {
    network {
      mode = "bridge"
      port "db" {
        to = 6379
      }
    }

    service {
      name = "redis"
      port = "6379"
      connect {
        sidecar_service {}
      }
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"
      }
    }
  }

  group "poke" {
    network {
      mode = "bridge"
    }

    service {
      name = "poker"
      port = "9999" # irrelevant 

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "redis"

              # should be able to exec into this task and
              # contact redis on 127.0.0.1:6379, handled
              # by Connect 
              local_bind_port = 6379
            }
          }
        }
      }
    }

    task "wait" {
      driver = "exec"

      config {
        command = "/bin/sleep"
        args    = ["10000"]
      }
    }
  }
}

dlightsey · October 8, 2020, 12:01am

Hi @shoenig, thanks for all your help. After several permutations between what you provided and my own configs, I found the needle in the haystack as to why my configs weren’t working but yours do. Not sure if it is a bug, but I can successfully repeat the issue now. Turns out that the ‘port’ parameter in the service stanza does not like labels, and only works when I use a numeric value. I can get it to work 100% of the time when I use port = “6379”, and can cause it to fail 100% of the time when I use port = “db”.

I’m back up and running on our POC now. Thanks a million!

dlightsey · October 8, 2020, 12:04am

Just to recap,

This works:
service {
name = “redis”
port = “6379”
connect {
sidecar_service {}
}
}

This does not work:
service {
name = “redis”
port = “db”
connect {
sidecar_service {}
}
}

shantanugadgil · October 8, 2020, 2:15pm

I think I have observed something related …
I sense there is definitely something missed since the service stanza was moved out of the task stanza and into the group stanza.

The port and ports has me confused as well.
(focusing on getting some simple things working before tackling whats up with that)

I know I am being vague … but I want to look deeper into the new example which gets generated as that is my starting point for new things I am trying out.

dlightsey · October 9, 2020, 12:00am

Unfortunately, this is causing a new issue for me. Because I can’t use the port labels from the network stanza, I can’t find a way to advertise the service using it’s dynamic port, which is causing me grief getting haproxy to find the backend services.

This should probably be submitted as a bug? Unless there is a workaround.

dlightsey · October 9, 2020, 5:26pm

As a workaround, I’ve decided to create sidecar proxies for all of the services that my haproxy LB needs to ingress traffic to, and then in the backend stanza in haproxy, I define the servers as localhost and the port of the upstream local bind port, and that seems to be working pretty nice. I now have all the services communicating together internally across the sidecars, with an haproxy LB setup to route traffic into the stack. Not 100% clean, but works fine for now.

egmanoj · October 8, 2022, 3:36am

As a workaround, I’ve decided to create sidecar proxies for all of the services that my haproxy LB needs to ingress traffic to, and then in the backend stanza in haproxy, I define the servers as localhost and the port of the upstream local bind port, and that seems to be working pretty nice.

@dlightsey I’ve run into the same problem. Do you mind sharing the changes you made for HAProxy? Thanks.

egmanoj · October 8, 2022, 11:37am

Anecdata: dynamic ports trouble only Redis, but not MinIO. Does the redis protocol (versus HTTP for MinIO) have to do with this?

bobek · February 24, 2023, 1:15pm

It turned out that address_mode=alloc worked out for me (auto is default). E.g.

    network {
      mode = "bridge"
      port "gotenberg" {
         to = 3000
      }
    }

    service {
      name = "gotenberg"
      port = "gotenberg"
      address_mode = "alloc"
      connect {
        sidecar_service {}
      }
    }

jhillyerd · February 25, 2023, 12:02am

Thank you for the address_mode=alloc tip, it fixed my traefik/whoami test job. This should probably make it way into the docs somewhere visible, or be considered a bug?

Clasyc · October 5, 2023, 8:44pm

Oh my god, this definitely must be emphasized more in the documentation. I lost half a day for this to finally make it work.

Topic		Replies	Views
Connecting to Postgres/Redis on database server from Nomad docker container "Error: Connection reset by peer" Consul connect , nomad	1	387	November 19, 2023
Envoy -> consul "upstream connect error or disconnect/reset before headers. reset reason: connection termination" Consul connect	5	6101	March 9, 2023
Nomad and Consul Weirdness... Terminated connections galore! Nomad consul	0	189	September 27, 2023
Nomad consul connect failing seperate nodes Nomad	4	898	February 24, 2022
Consul Connect + Envoy - gRPC issues Consul	3	2098	October 15, 2021

Consul Connect Connection Reset By Peer Errors

Related topics