Consul connect connection reset by peer

Hey!

I am trying to set up consul connect on one machine for a simple database and an application that just waits for a database to come up.

It’s not a special case. I just want a proof of concept for setting up Consul Connect so that I can later build and deploy applications with it.

Background

I have one node (vserver) to test it out. So there is no real data on those servers.

Node Configuration

My consul configuration file looks like this:

# /etc/consul.d/consul.hcl

data_dir = "/opt/consul"
ui_config{
  enabled = true
}
server = true
bootstrap_expect=1
retry_join = ["168.119.124.210"]
acl {
  enabled = false
}
connect {
  enabled = true
}
bind_addr      = "168.119.124.210"
advertise_addr = "168.119.124.210"
client_addr    = "0.0.0.0"
ports {
  http = 8500
  grpc = 8502
}

My nomad configuration file looks like this:

# /etc/nomad.d/nomad.hcl
data_dir  = "/opt/nomad/data"
bind_addr = "168.119.124.210"

advertise {
  # Defaults to the first private IP address.
  http = "168.119.124.210"
  rpc  = "168.119.124.210"
  serf = "168.119.124.210" # non-default ports may be specified
}

server {
  # license_path is required for Nomad Enterprise as of Nomad v1.1.1+
  #license_path = "/etc/nomad.d/license.hclic"
  enabled          = true
  bootstrap_expect = 1
}

client {
  enabled = true
}

acl {
  enabled = true
}

vault {
  enabled = true
  address = "http://127.0.0.1:8200"

  default_identity {
    aud = ["nomad"]
    ttl = "1h"
  }

  jwt_auth_backend_path = "nomad"
}

consul {
  address = "127.0.0.1:8500"
}

Nomad Job Configuration

I have TWO Nomad Jobs that are deployed via Terraform:

  1. customer-api.hcl
# customer-api.hcl

job "customer-api" {
  type      = "service"
  namespace = "${namespace}"

  group "api" {

    network {
      mode = "bridge"
    }

    service {
      name = "${namespace}-api"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "${namespace}-db"
              local_bind_port  = 5432
            }
          }
        }
      }
    }

    task "api" {
      driver = "docker"

      vault {
        policies = ["${namespace}"]
        role     = "${namespace}"
      }

      template {
        destination = "secrets/env"
        env         = true
        data = <<EOH
PGUSER={{ with secret "${secrets_path}/database" }}{{ .Data.data.username }}{{ end }}
PGPASSWORD={{ with secret "${secrets_path}/database" }}{{ .Data.data.password }}{{ end }}
PGDATABASE=postgres
PGHOST=localhost
PGPORT=5432
EOH
      }

      config {
        image   = "postgres:16-alpine"
        command = "sh"
        args = [
          "-ec",
          "until pg_isready -h \"$PGHOST\" -p \"$PGPORT\" -U \"$PGUSER\"; do echo waiting for db; sleep 1; done; psql -h \"$PGHOST\" -p \"$PGPORT\" -U \"$PGUSER\" \"$PGDATABASE\" -c 'select now();'; sleep 3600"
        ]
      }

      resources {
        cpu    = 200
        memory = 256
      }
    }
  }
}

# Please note that the variables in this specific configuration look like this:
namespace: customer-one
secrets_path: customers/customer-one
  1. customer-db.hcl
job "customer-db" {
  type      = "service"
  namespace = "${namespace}"

  group "db" {

    network {
      mode = "bridge"
      port "db" {
        to = 5432
      }
    }

    service {
      name = "${namespace}-db"
      port = "db"

      connect {
        sidecar_service {
        }
      }
       check {
        type      = "script"
        task      = "db"
        command  = "sh"
        args     = ["-ec", "pg_isready -h 127.0.0.1 -p 5432"]
        interval  = "30s"
        timeout   = "2s"
      }

    }

    task "db" {
      driver = "docker"

      vault {
        policies = ["${namespace}"]
        role     = "${namespace}"
      }

      template {
        destination = "secrets/postgres.env"
        env         = true
        data = <<EOF
{{ with secret "${secrets_path}/database" }}
POSTGRES_LISTEN_ADDRESSES=*
POSTGRES_DB=test
POSTGRES_USER={{ .Data.data.username }}
POSTGRES_PASSWORD={{ .Data.data.password }}
{{ end }}
EOF
      }

      config {
        image = "postgres:17"
        ports = ["db"]
      }

      resources {
        cpu    = 500
        memory = 512
      }
    }
  restart {
      attempts = 10
      interval = "5m"
      delay = "25s"
      mode = "delay"
    }
  }
   update {
    max_parallel = 1
    min_healthy_time = "5s"
    healthy_deadline = "3m"
    auto_revert = false
    canary = 0
  }
}

I have the CNI Plugins installed.

Network background

  • There is a docker bridge that appearently is being used more than the nomad virtual interface.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 92:00:06:3d:b9:b2 brd ff:ff:ff:ff:ff:ff
    altname enp1s0
    inet 168.119.124.210/32 brd 168.119.124.210 scope global dynamic eth0
       valid_lft 78223sec preferred_lft 78223sec
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 2e:7e:39:25:ec:26 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::2c7e:39ff:fe25:ec26/64 scope link 
       valid_lft forever preferred_lft forever
6: br-bd1cb1ebe544: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 66:03:d8:e7:81:a4 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-bd1cb1ebe544
       valid_lft forever preferred_lft forever
    inet6 fe80::6403:d8ff:fee7:81a4/64 scope link 
       valid_lft forever preferred_lft forever
7: vethafecdc9@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-bd1cb1ebe544 state UP group default 
    link/ether 52:c3:c7:c3:ef:5a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::50c3:c7ff:fec3:ef5a/64 scope link 
       valid_lft forever preferred_lft forever
8: veth5cdaf4c@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-bd1cb1ebe544 state UP group default 
    link/ether b6:97:c8:f9:5a:2f brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::b497:c8ff:fef9:5a2f/64 scope link 
       valid_lft forever preferred_lft forever
9: vethff1b435@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-bd1cb1ebe544 state UP group default 
    link/ether 62:1a:77:76:98:ac brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::601a:77ff:fe76:98ac/64 scope link 
       valid_lft forever preferred_lft forever
10: nomad: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:9f:d3:98:5f:0d brd ff:ff:ff:ff:ff:ff
    inet 172.26.64.1/20 brd 172.26.79.255 scope global nomad
       valid_lft forever preferred_lft forever
    inet6 fe80::689f:d3ff:fe98:5f0d/64 scope link 
       valid_lft forever preferred_lft forever
21: vethbd3fc599@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default 
    link/ether 2a:0c:89:05:5e:0d brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::280c:89ff:fe05:5e0d/64 scope link 
       valid_lft forever preferred_lft forever
22: veth3b215a7b@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default 
    link/ether be:f8:fc:aa:95:11 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    inet6 fe80::bcf8:fcff:feaa:9511/64 scope link 
       valid_lft forever preferred_lft forever
23: veth1e27da5b@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default 
    link/ether f2:e8:a1:5f:c3:67 brd ff:ff:ff:ff:ff:ff link-netnsid 5
    inet6 fe80::f0e8:a1ff:fe5f:c367/64 scope link 
       valid_lft forever preferred_lft forever
24: vetha421be1f@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default 
    link/ether 16:84:5e:9c:f0:d3 brd ff:ff:ff:ff:ff:ff link-netnsid 6
    inet6 fe80::1484:5eff:fe9c:f0d3/64 scope link 
       valid_lft forever preferred_lft forever
28: veth15cc5136@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nomad state UP group default 
    link/ether 72:ed:22:a4:98:6e brd ff:ff:ff:ff:ff:ff link-netnsid 7
    inet6 fe80::70ed:22ff:fea4:986e/64 scope link 
       valid_lft forever preferred_lft forever

IPTables

iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy DROP)
target     prot opt source               destination         
CNI-FORWARD  all  --  anywhere             anywhere             /* CNI firewall plugin rules */
DOCKER-USER  all  --  anywhere             anywhere            
DOCKER-FORWARD  all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain CNI-FORWARD (1 references)
target     prot opt source               destination         
NOMAD-ADMIN  all  --  anywhere             anywhere             /* CNI firewall plugin admin overrides */
ACCEPT     all  --  anywhere             172.26.64.12         ctstate RELATED,ESTABLISHED
ACCEPT     all  --  172.26.64.12         anywhere            
ACCEPT     all  --  anywhere             172.26.64.13         ctstate RELATED,ESTABLISHED
ACCEPT     all  --  172.26.64.13         anywhere            
ACCEPT     all  --  anywhere             172.26.64.15         ctstate RELATED,ESTABLISHED
ACCEPT     all  --  172.26.64.15         anywhere            
ACCEPT     all  --  anywhere             172.26.64.20         ctstate RELATED,ESTABLISHED
ACCEPT     all  --  172.26.64.20         anywhere            
ACCEPT     all  --  anywhere             172.26.64.21         ctstate RELATED,ESTABLISHED
ACCEPT     all  --  172.26.64.21         anywhere            

Chain DOCKER (2 references)
target     prot opt source               destination         
ACCEPT     tcp  --  anywhere             172.18.0.4           tcp dpt:http-alt
ACCEPT     tcp  --  anywhere             172.18.0.4           tcp dpt:3000
DROP       all  --  anywhere             anywhere            
DROP       all  --  anywhere             anywhere            

Chain DOCKER-BRIDGE (1 references)
target     prot opt source               destination         
DOCKER     all  --  anywhere             anywhere            
DOCKER     all  --  anywhere             anywhere            

Chain DOCKER-CT (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED

Chain DOCKER-FORWARD (1 references)
target     prot opt source               destination         
DOCKER-CT  all  --  anywhere             anywhere            
DOCKER-INTERNAL  all  --  anywhere             anywhere            
DOCKER-BRIDGE  all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain DOCKER-INTERNAL (1 references)
target     prot opt source               destination         

Chain DOCKER-USER (1 references)
target     prot opt source               destination         

Chain NOMAD-ADMIN (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             172.26.64.0/20 

In the current example the postgres container runs on 172.18.0.3and is available through the host.

$ ip route
default via 172.31.1.1 dev eth0 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.18.0.0/16 dev br-bd1cb1ebe544 proto kernel scope link src 172.18.0.1 
172.26.64.0/20 dev nomad proto kernel scope link src 172.26.64.1 
172.31.1.1 dev eth0 scope link 

So it looks like that the traffic is routed through the docker bridge, although I thought that it should not be routed through there.

Problem

The api container can not reach the database container.

The sidecar proxy container, that is right next to the database can run psql and gets a response.

The sidecar proxy container right next to the application can not run psql. It aborts with a connection reset error.

What I’ve tried

  • I have already tried to change the initially node configuration from the bind_addr (it was originally 127.0.0.1 to the now public ip of the virtual server.)
  • Debugging that I can not clearly summarize. I checked the ip interfaces but didn’t understand what was wrong.

I would love to get some help regarding this topic, because after three days of debugging, I ran out of ideas and creativity :slight_smile:

I think the problem maybe related the to bridge mode. because when network mode is set to bridge, each task group will have an isolated network namespace with an interface. so 127.0.0.1 refers to the namespace’s local loopback, not the host’s.

I see your nomad bridge is created with ip address 172.26.64.1

10: nomad: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:9f:d3:98:5f:0d brd ff:ff:ff:ff:ff:ff
    inet 172.26.64.1/20 brd 172.26.79.255 scope global nomad

In your nomal.hcl, you only set consul { address = "127.0.0.1:8500" } and omit grpc_address, so it default to 127.0.0.1:8502. so it’s expected the sidecar in the api group cannot talk to the host’s Consul gRPC at 127.0.0.1:8502 for xDS for the proxy config things because it’s a local address in an isolated namespace.

Since your Consul setup listens for gRPC on all interfaces client_addr = "0.0.0.0", so it should be available on the nomad broidge ip 172.26.64.1:8502.

I suggest to update consul.grpc_address in your nomad.hcl config file like this:

consul {
  address = "127.0.0.1:8500"
  grpc_address = "172.26.64.1:8502"
}

restart the nomad agent to pick up changes and redeploy the jobs as well.