Nomad Variables in Template Issues

Hi there, I’m not sure what information is needed to troubleshoot this so point me in the right direction. I’m running a 3-server cluster on version 1.6.1. I’m just getting around to experimenting with Nomad Variables and almost immediately hit a wall. Could be PEBCAK, I’m not sure. I’m starting to wonder if something is in a funky state among the cluster itself.

There are two other posts which seem related but don’t have a solution which works for me: Issues with using Nomad variables - Nomad - HashiCorp Discuss and Client.rpc: error performing RPC to server: error=“rpc error: Permission denied” rpc=Variables.Read - Nomad - HashiCorp Discuss

I have a fairly simple job with a template which uses a variable. The job hangs during deploy and the nomad alloc status <alloc-id> gives the following:

2023-08-25T09:15:33-04:00  Template    Missing: nomad.var.block(nomad/jobs/http-example/general-config@default.global)

Meanwhile, in the systemd log on the nomad worker where the allocation is scheduled is constantly printing the following:

[ERROR] client.rpc: error performing RPC to server: error="rpc error: Permission denied" rpc=Variables.Read server=[redacted]:4647
[ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: Permission denied" rpc=Variables.Read server=[redacted]:4647

Here’s an example job definition:

job "http-example" {
  datacenters = ["dc1"]
  group "http-example-group" {
    count = 1
    network {
      port "http" {
        to = "80"
      }
    }

    service {
      name = "http-example-service"
      tags = ["global", "example"]
      port = "http"

      check {
        name     = "alive"
        type     = "tcp"
        interval = "10s"
        timeout  = "2s"
      }
    }
    
    task "http-example-task" {
      driver = "docker"
      template {
        data        = <<EOH
{{ with nomadVar "nomad/jobs/http-example/general-config" -}}
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Runtime Local HTML Test</title>
  </head>
  <body>
    <h1>Runtime Local HTML Test</h1>

    <p>Key1: {{ .key1 }}</p>
    <p>Key2: {{ .key2 }}</p>
  </body>
</html>
{{- end }}
EOH
        destination = "local/html/runtime.html"
      }
      config {
        // use the image_tag variable from above
        image = "nginxdemos/hello"
        ports = ["http"]

        volumes = [
          "local/html/:/usr/share/nginx/html/local",
        ]
      }
    }
  }
}

And an example variable definition:

path = "nomad/jobs/http-example/general-config"

items {
  key1 = "value 1"
  key2 = "value 2"
}

Okay, so I found the solution and I’m a bit disappointed. Per Variables | Nomad | HashiCorp Developer, the Workload Identity gives access to nomad/jobs/$job_id, not nomad/jobs/$job_id/*, thus I can’t use nomad/jobs/http-example/general-config as in my example, I can only use nomad/jobs/http-example, nomad/jobs/http-example/http-example-group, or nomad/jobs/http-example/http-example-group/http-example-task

It seems like the implicit ACL should allow access to arbitrary variables under the various paths that correspond to the job/group/task name. My use-case is that I have a few separate pieces of configuration which will almost never change at the same time, and it makes a lot of sense to have them grouped in to separate variables. Hopefully this hasn’t been explained a hundred times before: what’s the reason for not allowing access to “sub-variables”?

(I know I can add ACLs, but there’s some serious gotchas there, such as being unable to apply the ACL before deploying the job, issues with batch jobs, etc.)

1 Like