Container cannot read nomad endpoint

hello,
I am trying to setup a NOMAD test environment using the (very useful!) -dev mode.

My test env has:

  • a traefik instance which acts as a load balancer
  • a number of HTTP containers

Traefik will map URLs to the correct container using tags. This works OK for my prod environment, but somehow I cannot replicate it in my test environment, because it seems that the Traefik container is not able to talk to the NOMAD endpoint (http://127.0.0.1:4646). Actually, I can see it from traefik logs:

time="2022-12-16T14:33:03Z" level=error msg="Provider connection error failed to load initial nomad services: Get \"http://127.0.0.1:4646/v1/services\": dial tcp 127.0.0.1:4646: connect: connection refused, retrying in 6.97842529s" providerName=nomad
time="2022-12-16T14:33:10Z" level=error msg="Provider connection error failed to load initial nomad services: Get \"http://127.0.0.1:4646/v1/services\": dial tcp 127.0.0.1:4646: connect: connection refused, retrying in 9.787400986s" providerName=nomad
time="2022-12-16T14:33:20Z" level=error msg="Provider connection error failed to load initial nomad services: Get \"http://127.0.0.1:4646/v1/services\": dial tcp 127.0.0.1:4646: connect: connection refused, retrying in 19.033258275s" providerName=nomad

Here is my Traefik job spec:

job "traefik" {
  datacenters = ["dc1"]
  type        = "service"

  group "traefik" {
    count = 1

    network   {
      mode = "host"
      port  "http"{
         static = 80
      }
      port  "admin"{
         static = 8080
      }
    }

    service {

      name = "traefik-http"
      provider = "nomad"
      port = "http"
    }

    task "server" {
      
      driver = "docker"
      config {
        image = "traefik:2.9.5"
        ports = ["admin", "http"]
        image_pull_timeout = "10m"
        args = [
          "--api.dashboard=true",
          "--api.insecure=true",
          "--entrypoints.http.address=:${NOMAD_PORT_http}",
          "--entrypoints.traefik.address=:${NOMAD_PORT_admin}",
          "--providers.nomad.exposedByDefault=false",
          "--providers.nomad=true",
          "--providers.nomad.endpoint.address=http://127.0.0.1:4646",
          "--providers.nomad.endpoint.tls.insecureSkipVerify=true",
          "--serversTransport.forwardingTimeouts.dialTimeout=600s",
          "--serversTransport.forwardingTimeouts.idleConnTimeout=600s"
        ]
      }
    }
  }
}

and this is the jobspec of the container which should be reachable from the load balancer:

job "shinyapp" {
  datacenters = ["dc1"]

  type = "service"

  group "app" {
    count = 1

    network {
      mode = "host"
       port "shinyserver" {
         to = 3838
       }
    }

    service {
      name = "app"
      port = "shinyserver"
      provider = "nomad"

      tags = [
        "traefik.enable=true",
        "traefik.http.routers.http.rule=PathPrefix(`/shiny`)"
      ]
    }

    task "server" {
      driver = "docker"

      resources {
        cpu = 2000 
        memory = 5000
      }

      config {
        image = "rocker/shiny:4"
        image_pull_timeout = "10m"
        ports = ["shinyserver"]
        auth_soft_fail = true
      }
    }
  }
}

Any idea on how I can fix the problem? Thanks.

Matteo

Hi @matteo3849329,

What OS is this development environment running on?

Thanks,
jrasell and the Nomad team

Hi jrasell,
I am running on linux:

Linux LP6 5.15.0-56-generic #62~20.04.1-Ubuntu SMP Tue Nov 22 21:24:20 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Nomad: Nomad v1.4.3 (f464aca721d222ae9c1f3df643b3c3aaa20e2da7)

Matteo

I believe you just need to set network_mode = "host" in the task driver config. The docker driver is a bit special in this regard (due to legacy reasons).

Relevant ticket: Inherit Docker Task networking configuration from Group Networking Block · Issue #10851 · hashicorp/nomad · GitHub