Failed due to progress deadline

odidev · September 22, 2021, 8:19am

Hi Team,

I have been using nomad to deploy two qemu instances and inside those using redis db. I have created example.nomad through nomad job init command and modified it to deploy redis through qemu as a driver.

I have started the nomad agent through nomad agent -dev -bind 0.0.0.0 -log-level INFO successfully and while running the example file, I am getting below error:

root:/usr/local/bin# nomad job run example.nomad
==> 2021-09-14T12:25:48Z: Monitoring evaluation "d652ee5d"
    2021-09-14T12:25:48Z: Evaluation triggered by job "example"
    2021-09-14T12:25:48Z: Evaluation within deployment: "9a34f918"
    2021-09-14T12:25:48Z: Allocation "55741617" created: node "39ab10e2", group "cache"
    2021-09-14T12:25:48Z: Evaluation status changed: "pending" -> "complete"
==> 2021-09-14T12:25:48Z: Evaluation "d652ee5d" finished with status "complete"
==> 2021-09-14T12:25:48Z: Monitoring deployment "9a34f918"
  ! Deployment "9a34f918" failed

    2021-09-14T12:35:48Z
    ID          = 9a34f918
    Job ID      = example
    Job Version = 0
    Status      = failed
    Description = Failed due to progress deadline

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    cache       1        5       0        5          2021-09-14T12:35:48Z

Also, I have built nomad from source for arm64 on AWS ubuntu environment and tested the created binary with make test-nomad command, getting DONE 5965 tests, 44 skipped, 105 failures in 794.943s.

Please check the detailed logs of failures: nomad-arm64-aws-ubuntu-logs.txt (570.4 KB)

It would be helpful if you could share some pointers to resolve the above failures.

lgfa29 · September 23, 2021, 12:08am

Hi @odidev

In those cases it’s useful to check the allocation status to see what’s preventing them from becoming healthy. The allocation ID is shown in the output of the nomad job run command.

Using the example terminal output you sent you can run a command like this:

$ nomad alloc status 55741617

Could you try again and send us the allocation status output? That way we can see what’s going wrong and can help you debug further

Thanks!

odidev · September 26, 2021, 5:47pm

Hi @lgfa29

Thanks for the quick response, as per your suggestions I ran the command
nomad alloc status allocation ID

Please find the logs below:

root@server:/usr/local/bin# nomad alloc status 5c27e558
ID                   = 5c27e558-1ba9-d7b3-ecdd-1695f372a48c
Eval ID              = 5fd50cfd
Name                 = example.cache[0]
Node ID              = 0adcfe56
Node Name            = x64server
Job ID               = example
Job Version          = 1
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 2m27s ago
Modified             = 1m57s ago
Deployment ID        = dc50959c
Deployment Health    = unhealthy
Replacement Alloc ID = 07f25c3d

Allocation Addresses
Label  Dynamic  Address
*db    yes      127.0.0.1:20682 -> 6379

Task "redis" is "dead"
Task Resources
CPU      Memory   Disk     Addresses
500 MHz  256 MiB  300 MiB

Task Events:
Started At     = N/A
Finished At    = 2021-09-23T13:44:51Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type               Description
2021-09-23T13:44:52Z  Killing            Sent interrupt. Waiting 5s before force killing
2021-09-23T13:44:51Z  Alloc Unhealthy    Unhealthy because of failed task
2021-09-23T13:44:51Z  Not Restarting     Error was unrecoverable
2021-09-23T13:44:51Z  Failed Validation  4 errors occurred:
        * failed to parse config:
        * Missing required argument: The argument "image_path" is required, but no definition was found.
        * Invalid label: No argument or block type is named "ports".
        * Invalid label: No argument or block type is named "image".
2021-09-23T13:44:51Z  Task Setup         Building Task Directory
2021-09-23T13:44:51Z  Received           Task received by client

angrycub · September 27, 2021, 10:54pm

odidev:

2021-09-23T13:44:51Z  Failed Validation  4 errors occurred:
        * failed to parse config:
        * Missing required argument: The argument "image_path" is required, but no definition was found.
        * Invalid label: No argument or block type is named "ports".
        * Invalid label: No argument or block type is named "image".

This set of messages should point you in the right direction. It appears that your task configuration is still written for the Docker task driver rather than the QEMU one. The proper arguments can be found in the QEMU task driver’s documentation.

I have a sample QEMU job that you can check out. Maybe it’ll give you some ideas.

Hope this helps!
Charlie V.

Topic		Replies	Views
Evaluation: maximum attempts reached (5) Nomad	1	882	December 3, 2020
Can't deploy example job (noob question) Nomad first-time-question	1	664	June 22, 2022
Nomad job run pytechco-redis.nomad.hcl, deployment is never in Successful status Nomad	14	1240	October 27, 2023
New User on Windows attempting to run example.nomad Nomad	6	2433	December 17, 2020
Job to deploy qemu VM keep failing Nomad	2	257	November 8, 2023

Failed due to progress deadline

Related topics