Nomad/Consul/Traefik 404 error

Hello all:

My environment:
Nomad v1.9.0
Consul v1.20.0
Traefik v3.2.0

I have a 3-node cluster running Nomad, Consul, and Vault.
I am attempting to use Traefik to load balance the Nomad UI (and eventually will do the same for Consul/Vault).
I am leveraging the Consul catalog so that Traefik can auto-discover the correct routes.

Here is what my Traefik job looks like:

job "traefik" {
  datacenters = ["homelab"]
  type        = "service"

  group "traefik" {
    network {
      port "http" {
        static = 80
      }
      port "https" {
        static = 443
      }
    }

    service {
      name     = "traefik"
      port     = "https"
      tags = [
        "traefik.enable=true",
        "traefik.http.routers.dashboard.rule=Host(`traefik.fqdn`)",
        "traefik.http.routers.dashboard.service=api@internal",
        "traefik.http.routers.dashboard.entrypoints=web,websecure",
        "traefik.http.routers.dashboard.tls.certresolver=internal",
        "traefik.http.routers.dashboard.tls=true",
      ] 

      check {
        name     = "alive"
        type     = "tcp"
        port     = "http"
        interval = "10s"
        timeout  = "2s"
      }
    }
    
    service {
      name     = "nomad"
      port     = "https"
      tags = [
        "traefik.enable=true",
        "traefik.http.routers.nomad.rule=Host(`nomad.fqdn`)",
        "traefik.http.routers.nomad.service=nomad",
        "traefik.http.routers.nomad.entrypoints=web,websecure",
        "traefik.http.routers.nomad.tls.certresolver=internal",
        "traefik.http.routers.nomad.tls=true",
        "traefik.http.services.nomad.loadbalancer.server.port=4646",
      ] 
    }

    task "traefik" {
      driver = "podman"
      config {
        image = "docker.io/library/traefik:v3.2.0"
        ports = [
          "http", 
          "https", 
        ]
        
        args = [
          "--api.dashboard=true",
          "--log.level=DEBUG",
          "--accesslog=true",

          # Consul integration
          "--providers.consulcatalog=true",
          "--providers.consulcatalog.exposedByDefault=false",
          "--providers.consulcatalog.prefix=traefik",
          "--providers.consulcatalog.endpoint.address=${NOMAD_IP_http}:8500",

          # HTTP entrypoints
          "--entrypoints.web.address=:${NOMAD_PORT_http}",
          "--entrypoints.websecure.address=:${NOMAD_PORT_https}",

          # Internal ACME/PKI
          "--certificatesresolvers.internal.acme.caserver=https://ca.fqdn/acme/acme/directory",
          "--certificatesresolvers.internal.acme.email=me@fqdn",
          "--certificatesresolvers.internal.acme.storage=/local/internal.acme.json",
          "--certificatesresolvers.internal.acme.tlschallenge=true",
          "--certificatesresolvers.internal.acme.certificatesduration=24",
          
          # Non-HTTP entrypoints
        ]
      }
      
      artifact {
        source = "https://ca.fqdn/roots.pem"
        mode   = "file"
      }

      env {
        LEGO_CA_CERTIFICATES   = "/local/roots.pem"
      }

      resources {
        cpu    = 100
        memory = 128
      }
    }
  }
}

When I curl to http://nomad.fqdn, I get a 404 error.
The Nomad entry in Traefik’s dashboard appears correct to me, and I am not sure what the issue could be.

Is anyone doing something similar with success? Any insight would be appreciated.

Thank you

You have two services (traefik and nomad) registered with the same port “https”. Seems wrong to me.

Can’t help with a quick fix, but this is how I’m load-balancing Nomad with a dynamic config for Traefik via the file provider:

http:
  routers:
    nomad:
      rule: Host(`nomad.lab.${var.base_domain}`)
      service: nomad@file
      entrypoints: websecure

  services:
    nomad:
      weighted:
        healthCheck: {}
        services:
        - name: nomad_master
          weight: 10
        - name: nomad_compute1
          weight: 1
        - name: nomad_compute2
          weight: 1

    nomad_master:
      loadBalancer:
        healthCheck:
          path: /v1/status/leader
          interval: 5s
          timeout: 2s
        servers:
          - url: "http://master.home:4646"

    nomad_compute1:
      loadBalancer:
        healthCheck:
          path: /v1/status/leader
          interval: 5s
          timeout: 2s
        servers:
          - url: "http://compute1.home:4646"

    nomad_compute2:
      loadBalancer:
        healthCheck:
          path: /v1/status/leader
          interval: 5s
          timeout: 2s
        servers:
          - url: "http://compute2.home:4646"

I’d say this is the Traefik way to reverse proxy external services. The same works for Consul.

It is my understanding, that Traefik would route by the hostname even though they are on the same port.

I’ll check out the file provider and report back.

Thanks!

@matthias: Thanks for pointing me in the right direction.
I was able to make progress and successfully get to the Nomad servers.

But it is not 100%. I occasionally get 502 errors and get this warning:

The warning points to a document for configuring a reverse proxy for the Nomad UI:

Unfortunately, it is geared towards nginx. I am researching Traefik options that could effectively give me the same results.

Have you experienced this in your environment?

Thank you

I passed the warning screen and 502 errors by introducing “stickiness” to the load balancer.
So far, it seems to be working as expected, but I’ll keep using this configuration and see if anything else is not working.

A basic reference to anyone in the future that runs into a similar situation:

http:
  routers:
    nomad-ui:
      entryPoints:
        - web
        - websecure
      rule: Host(`nomad.fqdn`)
      service: nomad-ui@file
      tls:
        certResolver: internal
  services:
    nomad-ui:
      loadBalancer:
        servers:
          - url: http://node01:4646
            weight: 34
            preservePath: true
          - url: http://node02:4646
            weight: 33
            preservePath: true
          - url: http://node03:4646
            weight: 33
            preservePath: true
        sticky:
          cookie: {}

Honestly, thinking about it the load-balancing seems to be a bit overengineered to me.

Will try later to just use ${attr.unique.network.ip-address}:4646, but it seems like there is an issue with variable replacement for external config files. Will have to look into that.
Seems like a safe bet to me to just use the IP of the local machine. Nomad should be running on that machine if Traefik is running there.

Right now, I’m copying my external dynamic configurations like this:

      dynamic "template" {
        for_each = fileset(".", "conf/*")

        content {
          data            = file(template.value)
          destination     = "local/${template.value}"
        }
      }

This picks up all the config files for Nomad, Consul and Proxmox in my case and deploys them with the main job file.
But right now, I can’t use variables in those files.

Update without the overengineered load-balancing

http:
  routers:
    nomad:
      rule: Host(`nomad.lab.domain.tld`)
      service: nomad@file
      entrypoints: websecure

  services:
    nomad:
      loadBalancer:
        healthCheck:
          path: /v1/status/leader
          interval: 5s
          timeout: 2s
        servers:
          - url: "http://{{ env "attr.unique.network.ip-address" }}:4646"

Always connects to the local Nomad instance, which should be up.

1 Like