Little help with Nomad arm64 and docker, folding at home

Hi folks,
I thought folding at home would be a good orientation to my pi nomad cluster, but I’m having trouble passing args to docker.
I can run instances of GitHub - beastob/foldingathome-arm64: Container deployment for Folding@Home
But whenever I try to craft args to pass to the docker instance the deployment fails.

Here’s my last attempt.
“Config”: {
“args”: [
“-e FOLD_USER=Matt_Clare”,
“-e TEAM=47936”,
“-e FOLD_ANON=false”
],
“image”: “beastob/foldingathome-arm64”
},

I’m sure I’ve misinterpreted how the environment variables work. Anyone’s help would be appreciated.

Hello!

Could you please give the error message? I’d like to see how the the deployment is failing.

Thanks for this!

Looks like this on the CLI
nomad run folding-try.json
==> Monitoring evaluation “12267e41”
Evaluation triggered by job “folding-at-home-stefancrain-2.3”
==> Monitoring evaluation “12267e41”
Evaluation within deployment: “bbfa4d7a”
Allocation “dbffe851” created: node “0b806119”, group “folding-group”
Evaluation status changed: “pending” → “complete”
==> Evaluation “12267e41” finished with status “complete”

then -----------------

nomad job status folding-at-home-stefancrain-2.3
ID = folding-at-home-stefancrain-2.3
Name = folding-at-home-stefancrain-2.3
Submit Date = 2021-03-06T21:26:57-05:00
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = running
Periodic = false
Parameterized = false

Summary
Task Group Queued Starting Running Failed Complete Lost
folding-group 0 1 0 5 0 0

Latest Deployment
ID = bbfa4d7a
Status = running
Description = Deployment is running

Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
folding-group 1 1 0 0 2021-03-07T13:50:29-05:00

Allocations
ID Node ID Task Group Version Desired Status Created Modified
dbffe851 0b806119 folding-group 0 run pending 35s ago 15s ago

then ---------------

nomad job status folding-at-home-stefancrain-2.3
ID = folding-at-home-stefancrain-2.3
Name = folding-at-home-stefancrain-2.3
Submit Date = 2021-03-06T21:26:57-05:00
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = pending
Periodic = false
Parameterized = false

Summary
Task Group Queued Starting Running Failed Complete Lost
folding-group 0 0 0 10 0 0

Future Rescheduling Attempts
Task Group Eval ID Eval Time
folding-group 9d9b693f 6m50s from now

Latest Deployment
ID = bbfa4d7a
Status = failed
Description = Failed due to progress deadline

Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
folding-group 1 5 0 5 2021-03-07T13:50:29-05:00

Allocations
ID Node ID Task Group Version Desired Status Created Modified
f5bdcebc 0b806119 folding-group 0 run failed 1m47s ago 1m6s ago
fd313a09 0ac95bfc folding-group 0 stop failed 6m25s ago 1m47s ago
94cca32b 0ac95bfc folding-group 0 stop failed 9m ago 6m25s ago
864410e5 b4d9ab8e folding-group 0 stop failed 10m36s ago 9m ago
dbffe851 0b806119 folding-group 0 stop failed 11m44s ago 10m36s ago

And here’s a screenshot of the UI version
Overview 03.07.2021-13.51.03
Evaluation 03.07.2021-13.51.56

Hi @mclare,

I’m not sure if this is the problem but -e in the docker run command sets an environment variable, not an argument, so you should set them using the env block inside your task.

From your pair of screenshots it seems like the allocation is failing to start without any specific event. This usually means your workload is failing, so you will need to check its logs. You can do it via the UI or using the nomad alloc logs command. Make sure you check both, stdout and stderr for messages :slightly_smiling_face:

Once you get access to the logs, post them here so we can help you debug this further.

1 Like

Thanks for this!

@lgfa29 - I had suspected as much with the args/env but under the conditions, I couldn’t make anything of it. Switching to env, logs are in this case, and in all non-running cases, empty.
Screenshot of new job and log states

The env block should be defined at the task level. I don’t have an ARM64 environment running right now to test this, but from the GitHub link this seems like the minimal job file necessary to run this image:

job "fah" {
  datacenters = ["dc1"]

  group "folding-at-home" {
    network {
      port "http" {
        to = 7396
      }
    }

    task "folding-at-home" {
      driver = "docker"

      config {
        image = "beastob/foldingathome-arm64:v7.6.14"
        ports = ["http"]
      }

      env {
        FOLD_USER = "user"
        FOLD_TEAM = "team"
      }
    }
  }
}

Give it a try and see if it runs.

After running nomad run, the output will give you the allocation ID. You can then use this value to query the allocation status:

$ nomad run fah.nomad
==> Monitoring evaluation "b12df39a"
    Evaluation triggered by job "fah"
==> Monitoring evaluation "b12df39a"
    Evaluation within deployment: "51a84755"
    Allocation "e9aaa0b8" created: node "924eec02", group "folding-at-home"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "b12df39a" finished with status "complete"
$ nomad alloc status e9aaa0b8
ID                  = e9aaa0b8-7f28-0110-8772-6518eff7e415
Eval ID             = b12df39a
Name                = fah.folding-at-home[0]
Node ID             = 924eec02
Node Name           = laoqui-hc-mbp-6.local
Job ID              = fah
Job Version         = 0
Client Status       = pending
Client Description  = No tasks have started
Desired Status      = run
Desired Description = <none>
Created             = 7s ago
Modified            = 7s ago
Deployment ID       = 51a84755
Deployment Health   = unset

Allocation Addresses
Label  Dynamic  Address
*http  yes      127.0.0.1:24045 -> 7396

Task "folding-at-home" is "pending"
Task Resources
CPU      Memory   Disk     Addresses
100 MHz  300 MiB  300 MiB

Task Events:
Started At     = N/A
Finished At    = N/A
Total Restarts = 1
Last Restart   = 2021-03-08T16:46:50-05:00

Recent Events:
Time                       Type            Description
2021-03-08T16:46:50-05:00  Restarting      Task restarting in 15.309237872s
2021-03-08T16:46:50-05:00  Driver Failure  Failed to pull `beastob/foldingathome-arm64:v7.6.14`: API error (404): manifest for beastob/foldingathome-arm64:v7.6.14 not found: manifest unknown: manifest unknown
2021-03-08T16:46:49-05:00  Driver          Downloading image
2021-03-08T16:46:49-05:00  Task Setup      Building Task Directory
2021-03-08T16:46:49-05:00  Received        Task received by client

You can see that, for me, the job is failing to start because I am not running in an ARM64 machine, so Docker can’t find an appropriate image

Failed to pull `beastob/foldingathome-arm64:v7.6.14`: API error (404): manifest for beastob/foldingathome-arm64:v7.6.14 not found: manifest unknown: manifest unknown

Send us the output for the allocation status as well :slightly_smiling_face:

Success!

The correct structure for the env was a huge help. The job initially failed with the version in the docker image string, but was successful on its own.
I was able to confirm the environment variables were passed in the stdout (sharing mostly for the next person).

This is a great basis for my further experimentation with Nomad.
Thanks @lgfa29 from Burlington, Ontario.

3 Likes

Glad you got it to work :tada: