Nomad job run pytechco-redis.nomad.hcl, deployment is never in Successful status

Hello,

I am just starting using Nomad, and following this tutorial.

When issuing the following command

nomad job run pytechco-redis.nomad.hcl

The deployment status is always in progress, even though I waited for more than 10 minutes.

  ⠧ Deployment "0577dcd1" in progress...
    2023-04-14T15:26:10-04:00
    ID          = 0577dcd1
    Job ID      = pytechco-redis
    Job Version = 0
    Status      = running
    Description = Deployment is running

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    ptc-redis   1        0       0        0          N/A

I have the setup configured locally in my MacBook. I have 2 windows term opened, one is for

sudo nomad agent -dev -bind 0.0.0.0

and the other for the other commands such as

export NOMAD_ADDR=http://localhost:4646
nomad job run pytechco-redis.nomad.hcl

Any idea where I should be looking into?

Thank you,
Laurentius

Hey Laurentius,

Sorry you’re hitting this error. This Getting Started guide is pretty new, so we might not have ironed out all the kinks yet.

The easier way to debug something like this is probably by using the Nomad UI. I would pop open the Job page in the UI and see if there are any errors that might be helpful.

Since I see that you have a desired count of 1 and a placed count of 0, that means that Nomad can’t find a place to put the task. If you have an error that says something like “Placement Failures” that might give you more info. Some common reasons would be if you don’t have the required task driver, or you dont have enough space on your computer, or you’ve specified some constraint like that it must be run on Linux.

What I’m guessing may have happened is that you don’t have docker running on your mac. If you see an error like “Constraint missing drivers filtered 1 node”, then that’s what happened. If you just start Docker Desktop, then it should fix itself. If that’s the case, we definitely need to specify that in the guide!!

If not, let me know if you see any other errors, and I can help debug.

Hey hey, thanks for trying out the tutorial! Apologies that it’s not working for you at the moment but we’ll help figure it out!

I think @mnomitch’s advice is right on track as I was able to replicate your issue if I quit Docker Desktop and run the tutorial.

In addition to checking out the Job information in the UI, you can also click on the Clients page from the left navigation, click on the one client (your mac), and scroll down to the Driver Status section. If the docker driver isn’t showing as detected (see screenshot), that will confirm it - start Docker Desktop and you should be good to go!

Additionally, we’ll update the tutorial to mention that Docker is a prerequisite.

Let us know if that helps!

Hello @tonino and @mnomitch ,

I saw that error message.

Does it have to be Docker Desktop (license :heavy_dollar_sign:)? I have Rancher Desktop up and running, but it doesn’t seem to be “picked up” by nomad.

Thank you,
Laurentius

This is the screenshot from the UI with Rancher Desktop running.

Not familiar with Rancher Desktop but from after some googling, it looks like it gives you the option to choose the container runtime, containerd or dockerd, and I’m assuming it uses containerd by default.

Can you check your config and switch to dockerd? Then close down your Nomad cluster, restart Rancher, and your Nomad cluster and see if the client page shows the docker driver then.

Not sure exactly how the inner workings of Nomad operate but if the docker command isn’t available when it starts, I imagine the driver won’t be detected. I think switching to the dockerd engine in Rancher might help.

Thanks @tonino. I’ll give it a try and let you know.

Using containerd or dockerd did not work either. I will look into this harder :grinning:

Once again, thanks for all your help.

Output:

▶ nomad job run pytechco-redis.nomad.hcl
==> 2023-04-14T17:33:53-04:00: Monitoring evaluation "e6419bce"
    2023-04-14T17:33:53-04:00: Evaluation triggered by job "pytechco-redis"
    2023-04-14T17:33:54-04:00: Evaluation within deployment: "f0c50e0c"
    2023-04-14T17:33:54-04:00: Evaluation status changed: "pending" -> "complete"
==> 2023-04-14T17:33:54-04:00: Evaluation "e6419bce" finished with status "complete" but failed to place all allocations:
    2023-04-14T17:33:54-04:00: Task Group "ptc-redis" (failed to place 1 allocation):
      * Constraint "missing drivers": 1 nodes excluded by filter
    2023-04-14T17:33:54-04:00: Evaluation "15fd9887" waiting for additional capacity to place remainder
==> 2023-04-14T17:33:54-04:00: Monitoring deployment "f0c50e0c"
  ⠴ Deployment "f0c50e0c" in progress...

    2023-04-14T17:33:54-04:00
    ID          = f0c50e0c
    Job ID      = pytechco-redis
    Job Version = 0
    Status      = running
    Description = Deployment is running

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    ptc-redis   1        0       0        0          N/A

I still couldn’t find solution for this. Anyone using Rancher Desktop and was able to do Nomad tutorial? (Deploy and Update a Job | Nomad | HashiCorp Developer)

Hi @laurentiuspurba,

I did a quick test installing Rancher Desktop. Based on a standard installation, you have to set the DOCKER_HOST environment variable before launching Nomad for the docker driver to be detected.

export DOCKER_HOST=unix://$HOME/.rd/docker.sock
nomad agent -dev

Could you please try this and see if you are able to run the job successfully?

You can see this in action in the below recording:

https://asciinema.org/a/poE4yFr9e4EbiHhrCh9bXrR6M

I hope this helps.

Thanks Ranjandas, it appears it worked, but I got different error. I haven’t Googled it yet, but if you know this error, I appreciate if you could share the resolution on that.

Failed due to progress deadline

▶ nomad job run pytechco-redis.nomad.hcl
==> 2023-10-15T11:03:06-04:00: Monitoring evaluation "8a198bd0"
    2023-10-15T11:03:06-04:00: Evaluation triggered by job "pytechco-redis"
    2023-10-15T11:03:07-04:00: Evaluation within deployment: "f9ca2ad8"
    2023-10-15T11:03:07-04:00: Allocation "52f4ab7c" created: node "3a1b0ef5", group "ptc-redis"
    2023-10-15T11:03:07-04:00: Evaluation status changed: "pending" -> "complete"
==> 2023-10-15T11:03:07-04:00: Evaluation "8a198bd0" finished with status "complete"
==> 2023-10-15T11:03:07-04:00: Monitoring deployment "f9ca2ad8"
  ! Deployment "f9ca2ad8" failed

    2023-10-15T11:13:06-04:00
    ID          = f9ca2ad8
    Job ID      = pytechco-redis
    Job Version = 0
    Status      = failed
    Description = Failed due to progress deadline

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    ptc-redis   1        3       0        2          2023-10-15T11:13:06-04:00
2023-10-15T11:17:03.672-0400 [DEBUG] client.driver_mgr.docker: failed to start container: driver=docker container_id=b3e1d26e94c5ec62d36e6db98bddbe56b37d0d528c8005f2c6e7c6ddd879cef9 attempt=6 error="API error (500): driver failed programming external connectivity on endpoint redis-task-5a858063-fb1d-5f39-4b13-c422749808d3 (a59c71c2a1b5048765088649a86a07f1a91a085e96a2a71e0fd1b9baa5c425fb): Error starting userland proxy: listen tcp6 [2600:1700:5830:7130:46:40e:2a8b:894e]:28791: bind: cannot assign requested address"
    2023-10-15T11:17:03.672-0400 [ERROR] client.driver_mgr.docker: failed to start container: driver=docker container_id=b3e1d26e94c5ec62d36e6db98bddbe56b37d0d528c8005f2c6e7c6ddd879cef9 error="API error (500): driver failed programming external connectivity on endpoint redis-task-5a858063-fb1d-5f39-4b13-c422749808d3 (a59c71c2a1b5048765088649a86a07f1a91a085e96a2a71e0fd1b9baa5c425fb): Error starting userland proxy: listen tcp6 [2600:1700:5830:7130:46:40e:2a8b:894e]:28791: bind: cannot assign requested address"
    2023-10-15T11:17:03.686-0400 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=5a858063-fb1d-5f39-4b13-c422749808d3 task=redis-task type="Driver Failure" msg="Failed to start container b3e1d26e94c5ec62d36e6db98bddbe56b37d0d528c8005f2c6e7c6ddd879cef9: API error (500): driver failed programming external connectivity on endpoint redis-task-5a858063-fb1d-5f39-4b13-c422749808d3 (a59c71c2a1b5048765088649a86a07f1a91a085e96a2a71e0fd1b9baa5c425fb): Error starting userland proxy: listen tcp6 [2600:1700:5830:7130:46:40e:2a8b:894e]:28791: bind: cannot assign requested address" failed=false
    2023-10-15T11:17:03.686-0400 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=5a858063-fb1d-5f39-4b13-c422749808d3 task=redis-task error="Failed to start container b3e1d26e94c5ec62d36e6db98bddbe56b37d0d528c8005f2c6e7c6ddd879cef9: API error (500): driver failed programming external connectivity on endpoint redis-task-5a858063-fb1d-5f39-4b13-c422749808d3 (a59c71c2a1b5048765088649a86a07f1a91a085e96a2a71e0fd1b9baa5c425fb): Error starting userland proxy: listen tcp6 [2600:1700:5830:7130:46:40e:2a8b:894e]:28791: bind: cannot assign requested address"
    2023-10-15T11:17:03.689-0400 [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=5a858063-fb1d-5f39-4b13-c422749808d3 task=redis-task reason="Exceeded allowed attempts 2 in interval 30m0s and mode is \"fail\""
    2023-10-15T11:17:03.689-0400 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=5a858063-fb1d-5f39-4b13-c422749808d3 task=redis-task type="Not Restarting" msg="Exceeded allowed attempts 2 in interval 30m0s and mode is \"fail\"" failed=true
    2023-10-15T11:17:03.785-0400 [DEBUG] http: request complete: method=GET path=/v1/client/stats?node_id=3a1b0ef5-40db-fc46-7c52-a30d93531d12 duration="348.296µs"

Hi @laurentiuspurba,

This seems to be an issue with the way Rancher Desktop port forwarding is implemented (Disclaimer: not an expert on Rancher Desktop).

You will see that, Rancher Desktop doesn’t allow to portmap to specific IP Addresses on the host.

$ docker run -p 192.168.18.11:8080:80 nginx
docker: Error response from daemon: driver failed programming external connectivity on endpoint relaxed_dirac (072c107d06f4b73481d4fdb9a62b4220567aae9e4496c69702bb4aed8970f738): Error starting userland proxy: listen tcp4 192.168.18.11:8080: bind: cannot assign requested address.

Nomad is trying to do the above and failing.

When I try this on Docker Desktop, it works perfectly fine: Docker Desktop - Port Forward per Host IP - asciinema

Switching to Docker Desktop will unblock you!

I had a discussion in the rancher-desktop slack channel and have got a confirmation that explicitly port-mapping to specific host IPs (except 127.0.0.1) is not supported at the moment.

https://slack-archive.rancher.com/t/15989096/hi-team-i-just-started-using-rancher-desktop-and-trying-to-r#908bff43-632d-4513-9041-65b9a537a783

I hope this helps.

Hello @Ranjandas

I really appreciate your effort on this.

Due to Docker Desktop license change, I don’t know if I am allowed to install it on my office laptop.

So excited about Nomad, especially after Hashi Conf, but fall short on this.

Once again, thank you.

Laurentius

Hi @laurentiuspurba,

If you can install multipass, you can use this cloud-init file to spin up a VM with the Nomad + Consul dev environment.

ref: A cloud-init file to spin up a single node Nomad + Consul dev environment. · GitHub

This would help you complete the tutorial you were trying to follow.

Once the VM is up, you can interact with Nomad and Consul from the host by setting the NOMAD_ADDR and CONSUL_HTTP_ADDR environment variables respectively using CLI, and access the UIs from the browser.

You can also interact with the app hosted on the nomad cluster from the Host.

➜  jobs git:(main) nomad service info ptc-web-svc
Job ID        Address             Tags  Node ID   Alloc ID
pytechco-web  192.168.65.15:5000  []    a7a683e5  11b821c6

https://asciinema.org/a/0RnsfID4dFawowxksi03AoEdI

I hope this helps.