Every deployment ends with No allocations for job are running, third deployment prunes the job from nomad away

I try to deploy to nomad with using nomad-jobspec similar to Hashicorp examples, replacing the example application with nginx hello. But every deployments end with following message:

No allocations for job  are running!
Waypoint detected that the current deployment is not ready, however your application
might be available or still starting up.
Resource  is reporting "DOWN"

and the first release ends with following message (probably related to not having canary to promote)

Getting status of Nomad release...
Getting job info...
! failed performing status report op: rpc error: code = Internal desc = resource
  manager failed to generate resource statuses: Unexpected response code: 404 (job
  not found)

furthermore waypoint status reports the deployment status as :heavy_multiplication_x: DOWN in all versions of deployments.

When I check nomad (wether UI or nomad job status) - the job is running and accessible.

Can I do something to get rid of those messages?

Furthermore in the third deployment the job in nomad is pruned and destroys even the running jobs in nomad

✓ Running deploy v3
✓ Job registration successful
✓ Allocation "bf38e137-7409-f3d5-8a12-ae6af360c878" created: node "5d7df7c9-f0a5-65f6-f7fb-48158a417fe4", group "waypoint-poc"
✓ Evaluation status changed: "pending" -> "complete"
✓ Evaluation "5ea0c030-7a39-2bb4-9f7f-e5e7f8d84747" finished with status "complete"
✓ Deployment successfully rolled out!

✓ Finished building report for Nomad platform
✓ Getting job info...
❌ No allocations for job "waypoint-poc" are running!
⚠️ Waypoint detected that the current deployment is not ready, however your application
might be available or still starting up.
⚠️ Resource "waypoint-poc" is reporting "DOWN"

» Releasing waypoint-poc...
✓ Running release v3
✓ Evaluation status changed: "pending" -> "complete"
✓ Evaluation "eba620ba-a102-ee4c-932e-5ea561b15b28" finished with status "complete"
✓ Release successfully rolled out!

» Pruning old deployments...
  Deployment: 01GEETZA65GY9YVS3R9SPMFNVC (v1)
✓ Running deployment destroy v1
✓ Deleting job: waypoint-poc  <-----------------------

» Pruning old releases...
  Release: 01GEETZDJK2EQZ8NMSMPPXRA1A (v1)
✓ Running release destroy v1

❌ Getting status of Nomad release...
❌ Getting job info...
! failed performing status report op: rpc error: code = Internal desc = resource
  manager failed to generate resource statuses: Unexpected response code: 404 (job
  not found)

Doesnt line
Deleting job: waypoint-poc
mean that waypoint deletes all the allocations (even the new ones) of nomad jobs? When I try using -prune=false it keeps the NEW deployment happily running. But this limits me on using only the CLI and not the automated git pooling.

Thank you in advance.

Waypoint instalation steps:

waypoint install -platform=nomad 
-accept-tos \
-nomad-runner-host-volume=waypoint-runner-volume \
-nomad-host-volume=waypoint-server-volume 
-nomad-dc=play 
-nomad-consul-service=false 
-nomad-host=https://nomad03.nomadplay:4646
-- -advertise-addr=x.x.x.x

waypoint config set -runner -scope=global NOMAD_SKIP_VERIFY=true #To ignore unrtrusted cert

Nomad version: Nomad v1.3.5 (1359c2580fed080295840fb888e28f0855e42d50)
Waypoint version: CLI: v0.10.1 (830e74dd0)
Server: v0.10.1

Hi @josef.kadlecek ! Thanks for providing all this info! Would you mind also adding your waypoint.hcl to help troubleshoot?

Thanks for the quick reply!

project = "waypoint-poc"

app "waypoint-poc" {

  build {
    use "docker" {}
    registry {
        use "docker" {
          image= "x.x.x.x:5000/example"
          insecure = true
          tag = 1
          local = false
        }
    }
  }

  deploy {
    use "nomad-jobspec" {
      jobspec = templatefile("${path.app}/app.nomad.tpl")
    }
  }

  release {
    use "nomad-jobspec-canary" {
      groups = [
        "waypoint-poc"
      ]
      fail_deployment = false
    }
  }

}


and the app.nomad.tpl


job "waypoint-poc" {
  datacenters = ["play"]

  group "waypoint-poc" {
    update {
      max_parallel = 1
      canary       = 1
      auto_revert  = false 
      auto_promote = false
      health_check = "task_states"
    }

    network {
      port "http" {
        to = 80
      }
    }

    service {
      port = "http"
      check {
        type     = "http"
        path     = "/"
        interval = "10s"
        timeout  = "2s"
      }
    }

    task "waypoint-poc" {
      driver = "docker"
      config {
        image = "${artifact.image}:${artifact.tag}"
        ports = ["http"]
      }

      env {
        %{ for k,v in entrypoint.env ~}
        ${k} = "${v}"
        %{ endfor ~}

        // For URL service
        PORT = 80
      }
    }
  }
}

Because the deployment is reporting “DOWN”, it probably cleans up by removing the completed deployments. The problem appears to exist in the “waypoint-poc” job. At first glance I would suggest removing the update stanza from the “waypoint-poc” job.

After removing the update stanza the deployments are not pruned. This means new ones are created and not deleted. That is nice.

But it did not get rid of following error messages

✓ Getting job info...
❌ No allocations for job "waypoint-poc" are running!
⚠️ Waypoint detected that the current deployment is not ready, however your application
might be available or still starting up.
⚠️ Resource "waypoint-poc" is reporting "DOWN"

» Releasing waypoint-poc...

❌ Getting status of Nomad release...
❌ Getting job info...
! failed performing status report op: rpc error: code = Internal desc = resource
  manager failed to generate resource statuses: Unexpected response code: 404 (job
  not found)

Furthermore I believe I cant use the canary development without the update stanza, right?

I will try to play with the parameters of update stanza and I will let you know if something got better, any advice how to do so is welcome. As well as advice how to get rid of those errors.

Thanks for your quick reply.

So I tried tweaking with all the possible parameters of update stanza and to me it seems that “auto_promote = false” causes many issues.

Especially when one has auto_promote set to false AND waypoint includes release stanza with nomad-jobspec-canary.

What is the expected behavior?

My best guess would be that the job is waiting for manual promotion, but the release stanza with canary deploys it anyway (this makes the example at waypoint-examples/nomad/nodejs-jobspec at main · hashicorp/waypoint-examples · GitHub a little bit confusing, especially as it is the only example for nomad utilizing jobspec).

I’ve tried removing the release stanza and then the job doesn’t get auto promoted. But the auto_promote = False caused issues anyway (again the third release pruned even previous jobs).

After setting the auto promote the things work much more smoothly.

Mainly noting that there if someone encounters similar problem.

Oh I noticed some new information in docs.

According to nomad-jobspec-canary plugin docs :

Note: Using the -prune=false flag is recommended for this releaser. By default, Waypoint prunes and destroys all unreleased deployments and keeps only one previous deployment. Therefore, if -prune=false is not set, Waypoint may delete your job via “pruning” a previous version. See deployment pruning for more information.

Unfortunately, according to Deployment Pruning page docs:

CLI flags are the only way to customize this today. In the future, we will support setting defaults on the server side, in the waypoint.hcl file, and via the UI.

So if my assuptions are correct, my workaround with setting auto promote to true just broke the waypoint releaser, as nomad was the one who released next version, thus waypoint had nothing to do, failed, and did not prune old (nor new) deployments.

Which unfortunately makes automated releases with git pooling and nomad jobspec currently unusable :frowning: .