Error performing RPC to server: allocation is terminal

We have a job with 5 tasks:

  1. Prestart task: setup-task (working)
  2. Main task (working)
  3. Post start task (working)
  4. Post stop task (working)
  5. Post stop task: teardown-task (problematic)

All these tasks make use of template blocks that refer to the job’s corresponding Nomad variable, e.g., nomad/jobs/the-job-name:

Example job:

job "the-job-name" {
  group "the-job-group" {
    task "setup-task" {
      lifecycle {
        hook    = "prestart"
        sidecar = false
      }

      driver = "raw_exec"

      template {
        change_mode          = "noop" # this is a prestart task
        error_on_missing_key = true
        destination          = "$${NOMAD_SECRETS_DIR}/.setup"
        env                  = true
        data                 = <<EOT
{{ with nomadVar (printf "nomad/jobs/%v" (env "NOMAD_JOB_NAME")) }}
KEY1 = {{ .key1 }}
KEY2 = {{ .key2 }}
{{- end }}
EOT
      }

      config {
        command = "/path/to/setup/script.sh"
        args = [
          KEY1,
          KEY2
        ]
      }
    }

    // not showing all the tasks
    // they all have template blocks similar to the ones above and below

    task "teardown-task" {
      lifecycle {
        hook    = "poststop"
        sidecar = false
      }

      driver = "raw_exec"

      template {
        change_mode          = "noop" # this is a poststop task
        error_on_missing_key = true
        destination          = "$${NOMAD_SECRETS_DIR}/.teardown"
        env                  = true
        data                 = <<EOT
{{ with nomadVar (printf "nomad/jobs/%v" (env "NOMAD_JOB_NAME")) }}
KEY1 = {{ .key1 }}
KEY2 = {{ .key2 }}
{{- end }}
EOT
      }

      config {
        command = "/path/to/teardown/script.sh"
        args = [
          KEY1,
          KEY2
        ]
      }
    }
  }
}

The Nomad job runs as expected except for the last poststop teardown-task. When it is supposed to execute it eventually results in a timeout after a couple of minutes (±12 attempts):

Nomad logs:

2023-10-26T08:12:18.266Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: allocation is terminal" rpc=ACL.WhoAmI server=10.0.2.5:4647
2023-10-26T08:12:18.267Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: allocation is terminal" rpc=ACL.WhoAmI server=10.0.2.5:4647
2023-10-26T08:12:18.267Z [ERROR] http: error authenticating built API request: error="rpc error: allocation is terminal" url="/v1/var/nomad/jobs/the-job-name?namespace=default&stale=&wait=60000ms" method=GET
2023-10-26T08:12:18.268Z [WARN]  agent: (view) nomad.var.block(nomad/jobs/the-job-name@default.global): Unexpected response code: 500 (Server error authenticating request) (retry attempt 11 after "1m0s")

2023-10-26T08:13:18.276Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: allocation is terminal" rpc=ACL.WhoAmI server=10.0.2.3:4647
2023-10-26T08:13:18.276Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: allocation is terminal" rpc=ACL.WhoAmI server=10.0.2.3:4647
2023-10-26T08:13:18.277Z [ERROR] http: error authenticating built API request: error="rpc error: allocation is terminal" url="/v1/var/nomad/jobs/the-job-name?namespace=default&stale=&wait=60000ms" method=GET
2023-10-26T08:13:18.278Z [WARN]  agent: (view) nomad.var.block(nomad/jobs/the-job-name@default.global): Unexpected response code: 500 (Server error authenticating request) (retry attempt 12 after "1m0s")

2023-10-26T08:14:18.284Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: allocation is terminal" rpc=ACL.WhoAmI server=10.0.2.5:4647
2023-10-26T08:14:18.285Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: allocation is terminal" rpc=ACL.WhoAmI server=10.0.2.5:4647
2023-10-26T08:14:18.285Z [ERROR] http: error authenticating built API request: error="rpc error: allocation is terminal" url="/v1/var/nomad/jobs/the-job-name?namespace=default&stale=&wait=60000ms" method=GET
2023-10-26T08:14:18.286Z [ERROR] agent: (view) nomad.var.block(nomad/jobs/the-job-name@default.global): Unexpected response code: 500 (Server error authenticating request) (exceeded maximum retries)

2023-10-26T08:14:18.287Z [ERROR] agent: (runner) watcher reported error: nomad.var.block(nomad/jobs/the-job-name@default.global): Unexpected response code: 500 (Server error authenticating request)
2023-10-26T08:14:18.287Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=30764d6b-18b0-270f-fe97-33a301f9198c task=teardown-task type=Killing msg="Template failed: nomad.var.block(nomad/jobs/the-job-name@default.global): Unexpected response code: 500 (Server error authenticating request)" failed=true
2023-10-26T08:14:18.290Z [INFO]  client.gc: marking allocation for GC: alloc_id=30764d6b-18b0-270f-fe97-33a301f9198c
2023-10-26T08:14:22.292Z [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=30764d6b-18b0-270f-fe97-33a301f9198c task=teardown-task @module=logmon timestamp=2023-10-26T08:14:22.291Z
2023-10-26T08:14:22.293Z [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=30764d6b-18b0-270f-fe97-33a301f9198c task=teardown-task @module=logmon timestamp=2023-10-26T08:14:22.292Z
2023-10-26T08:14:22.296Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=30764d6b-18b0-270f-fe97-33a301f9198c task=teardown-task path=/usr/bin/nomad pid=2570689
2023-10-26T08:14:22.297Z [INFO]  agent: (runner) stopping

We also see this in the web UI:

The main issues seem to be:

  1. Error performing RPC to server: allocation is terminal
  2. (view) nomad.var.block(nomad/jobs/the-job-name@default.global): Unexpected response code: 500 (Server error authenticating request)

What do these mean and what can I do to resolve them?

I can assure you that the Nomad variable that it’s complaining about in the screenshot (whose name corresponds with the job name as per the example above) DOES exist.

Seems to be a bug as per Poststop lifecycle task - Can't request Vault token for terminal allocation · Issue #16886 · hashicorp/nomad · GitHub