Can't seem to get load balancer tutorials to work

I’m new to Nomad, and I can’t seem to get the load balancer tutorials to work. I’ve tried the fabio, nginx, and traefik ones. I can see each of the web app demos if I hit the port for each individual instance, but I can’t see it through the load balancer.

For me, the issue is how to see the statically port through the load balancer. In the nginx tutorial, when I hit localhost:8080, I see a 502 bad gateway. when I run nomad fs alloc [id] nginx/local/load-balancer.conf, I can see that it’s configured correctly–the server address and ports match up.

In the traefik tutorial, I also just ran the two nomad files, but when I hit localhost:8080, I see nothing. The consul UI says that the Traefik alive service check is failing.

I’m not sure how to debug this. I’m pretty close to quitting this and going back to Kubernetes. I didn’t change the nomad files at all from any of the examples.

Hi @iamwil :wave:

I’m sorry to hear you hit this problem, but we appreciate you giving Nomad a try :slightly_smiling_face:

Would you happen to be running in a Windows or MacOS environment?

If so, Docker Desktop in these platforms adds a layer of indirection in the network stack, so you will need to make a small change to the nginx.nomad file to be able to access the host network from within the container:

job "nginx" {
  # ...
  group "nginx" {
    # ...
    task "nginx" {
      # ...
      template {
        data = <<EOF
upstream backend {
{{ range service "demo-webapp" }}
-  server {{ .Address }}:{{ .Port }};
+  server host.docker.internal:{{ .Port }};
{{ else }}server 127.0.0.1:65535; # force a 502
{{ end }}
}

If you are not on Windows or MacOS, could you provide more details about your environment?

Thank you!

1 Like

Crikey. That was it. I don’t think I would have figured that out on my own. You sound like you work at Hashicorp. Could the docs get updated to include this tid bit so others don’t get stuck?

Is there a similar gotcha for Traefik? Does the alive check affect Traefik configuration through Consul? Traefik was the original load balancer that I wanted to get working.

1 Like

Phew…glad it’s working now.

Yes, I will check with the rest of the team on what’s the best way to get this tutorial to work on other platforms. I just waiting to hear back from you and make sure that this was the problem :grinning_face_with_smiling_eyes:.

Ah sorry, I forgot to check Traefik.

The first problem I see is that the job file is missing the port assignment to the task (I’ll make sure this gets fixed as well).

To deal with the Docker network issue, the first thing is that the Traefik container needs to be able to reach Consul. Normally the Consul agent would be available at 127.0.0.1:8500 of the host network, but that won’t work here.

So with these changes to traefik.nomad you should be able to get it running and healthy:

job "traefik" {
  # ...
  group "traefik" {
    # ...
    task "traefik" {
      driver = "docker"

      config {
        image        = "traefik:v2.2"
-       network_mode = "host"
       
+       ports = ["api", "http"] 

        volumes = [
          "local/traefik.toml:/etc/traefik/traefik.toml",
        ]
      }

      template {
        data = <<EOF
[entryPoints]
    [entryPoints.http]
    address = ":8080"
    [entryPoints.traefik]
    address = ":8081"

[api]
    dashboard = true
    insecure  = true

# Enable Consul Catalog configuration backend.
[providers.consulCatalog]
    prefix           = "traefik"
    exposedByDefault = false

    [providers.consulCatalog.endpoint]
-     address = "127.0.0.1:8500"
+     address = "host.docker.internal:8500"
      scheme  = "http"
EOF

        destination = "local/traefik.toml"
      }
      # ...
    }
  }
}

The other part of the problem is that, since Nomad doesn’t know about this extra Docker network, the IP it will fingerprint for the client and store in the Consul catalog will not work from within a container.

I think there are two options here:

  1. Don’t use the Consul catalog for now, but rather render to a file and use the file provider. This would look like the Nginx example, and your template block would look like this:
job "traefik" {
  # ...
  group "traefik" {
    # ...
    task "traefik" {
    # ...
      template {
        data = <<EOF
[entryPoints]
    [entryPoints.http]
    address = ":8080"
    [entryPoints.traefik]
    address = ":8081"

[api]
    dashboard = true
    insecure  = true

- # Enable Consul Catalog configuration backend.
- [providers.consulCatalog]
-   prefix           = "traefik"
-   exposedByDefault = false
-
-   [providers.consulCatalog.endpoint]
-     address = "127.0.0.1:8500"
-     scheme  = "http"
+ [providers]
+   [providers.file]
+     directory = "/etc/traefik"
+ 
+ [http]
+   [http.routers]
+     [http.routers.http]
+       rule = "Path(`/myapp`)"
+       service = "demo-webapp"
+ 
+   [http.services]
+     [http.services.demo-webapp.loadBalancer]
+ 
+   {{ range service "demo-webapp" }}
+       [[http.services.demo-webapp.loadBalancer.servers]]
+         url = "http://host.docker.internal:{{ .Port }}/"
+   {{ end }}
EOF

        destination = "local/traefik.toml"
      }
      # ...
    }
  }
}

To make it easier, I uploaded the final job that worked for me in this Gist traefik.nomad · GitHub

  1. Use a Linux VM. This would be a bit more work to setup, but it would better simulate a production environment where Docker runs natively. Without this Docker network in the middle everything actually works pretty seamlessly :slightly_smiling_face:

The alive check in the traefik.nomad file tells Nomad if Traefik is healthy. So it doesn’t affect the Traefik configuration, but rather you will see the Traefik allocation in Nomad being restarted after the healthy_deadline passes.

Ah, ok, thanks for the help. This worked on my Mac. I’m guessing other people have run into this before, so you correctly surmised that it was because of Desktop Docker with a network adding a layer of indirection.

So if I understand correctly, this wouldn’t be an issue on any Linux distro. It’s a leaky abstraction that surfaced its head.

Will nomad eventually be able to use consulCatalog with Docker Desktop? Is it a matter of getting around to implementing it?


To give you an idea of why I’m doing this, I’m motivated by trying to find an orchestration solution that can run docker containers, but isn’t so complex that it needs a full team to maintain and upgrade it. I’ve run Kubernetes before on my own, and it felt like sandblasting a cracker.

That’s why I’m trying out Nomad on my own local machine first, get it set up to see how it would work, and then ideally, the same/similar job files could be used to deploy to production. That way, I can run both a copy on my machine that mimics production, and it’s a reproducible environment for other devs.

Btw, you saved me from quitting Nomad, so thanks! I really wanted to make it work, so was sufficiently motivated to look for a forum, but it shouldn’t come to that. Hashicorp should really make sure the tutorials and guides are well-polished and a smooth experience. Think of it as part of onboarding. If people have great experiences, they’ll keep going and adopt Nomad and the rest of the ecosystem.

1 Like

I am glad to hear you were able to get this working :slightly_smiling_face:

That’s right, I tested the tutorial on Linux and it all worked without change.

After talking with some colleagues, I learned a way to make it work. You just need to bind Nomad and Consul to a network interface in your machine. I started documenting this process here: docs: add FAQ for Docker Desktop for Windows and MacOS by lgfa29 · Pull Request #10390 · hashicorp/nomad · GitHub

It still a WIP, but once it’s done we will add links in other places to there.

Yes, you are absolutely right. We are always improving our documentation and adding new tutorials, but sometimes it’s hard to account for all different environments. So we really appreciate your feedback and reaching out to us :heart:

Oh, thanks for going above and beyond asking the colleagues and putting together the FAQ. I’ll check it out.

Yes, you are absolutely right. We are always improving our documentation and adding new tutorials, but sometimes it’s hard to account for all different environments. So we really appreciate your feedback and reaching out to us

Yes, I totally understand. Perhaps if the docs also had a way to report when something isn’t working, that’ll help tighten the feedback loop.

1 Like

Thanks for understanding and sticking around :slightly_smiling_face:

For our tutorials you can provide feedback at the bottom of the page by clicking one of these buttons:

A text box will appear where you will be able to provide more details.

For our docs on https://www.nomadproject.io you can file an issue in our GitHub repo.

i got this too, feel like you guys should put this in the tutorial, could’ve saved a lot of folks lots of trouble. but thanks tho

I stumbled upon a similar issue following this tutorial

I had to set network_mode = "host" to get it working

I am running nomad with docker on linux (manjaro).

I’m having a similar problem, followed the tutorial exactly, but I’m running linux

./nomad alloc fs 8b nginx/local/load-balancer.conf returns

upstream backend {

  server 127.0.0.1:28550;

  server 127.0.0.1:20298;

  server 127.0.0.1:24290;

}

which doesn’t seem right, nginx returns 502 with error in log;

connect() failed (111: Connection refused) while connecting to upstream, client: 172.17.0.1, server: , request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:28550/", host: "127.0.0.1:8080"