Nomad Vars and usage and Audit

Hi team,

We store app specific ENV vars in vault and use them in jobs via templates. Works very well. When Nomad 1.4 launched with vars inbuilt, I got really excited. As it would allow us to remove the Vault depencency. Still trying to figure out -

  1. Do the Nomad vars have some kind of history associated with them. Like Vault has a versions, and you can easily see what changed between versions.
  2. The docs are not very comprehensive about exposing the Nomad vars to all the jobs in a namespace. Right now it seems that the vars can be exposed per job/group/task basis. Our use case requires us to run multiple jobs in a namespace (which is a proxy for prod/staging etc…). Is there a way to share the same set of Vars with all the jobs in the Namespace without duplicating them per job. I tried attaching ACL policy with wildcard (*) in the path and then applying this policy to this namespace and a wild card as the job, but that doesn’t seem to work.
# policy-read.hcl
namespace "ns-1" {
  variables {
    path "*" {
      capabilities = ["read", "list"]
    }
  }
}

# attach this policy to NS and Jobs
nomad acl policy apply -namespace ns-1 -job "*" -description "allow var shared access" ns-var-read policy-read.hcl

Any help is much appreciated.

Thanks,
Vikas

Hi @jrasell and Nomad team, really appreciate if you can spare some thoughts on this.

Thank you in advance.

Hi @vikas.saroha,

Do the Nomad vars have some kind of history associated with them. Like Vault has a versions, and you can easily see what changed between versions.

No, Nomad currently does not support this, however, it is a feature that we are open to discussing as a feature request. Would you mind opening a feature request against the Nomad repository and include any specific use cases you have?

Is there a way to share the same set of Vars with all the jobs in the Namespace without duplicating them per job.

I believe currently you will need to create an ACL policy which is then applied to each job you wish to have access to the variables. Do you have any suggestions on what could be added to the workload identity and variables concepts pages?

Thanks,
jrasell and the Nomad team

Our team is also at a stage where we would like to proactively associate a shared/sensitive policy to multiple jobs that may not even exist yet. What follows is what I tried to get working before I found this topic of @vikas.saroha…

sensitive.policy.hcl:

namespace "sensitive" {
  variables {
    path "*" {
      capabilities = ["read"]
    }

    path "azure/*" {
      capabilities = ["read"]
    }
  }
}

It works as expected when applying it to a real job:

$ nomad namespace apply -description "Namespace for sensitive variables" sensitive
$ nomad var put -namespace sensitive @../path/to/sensitive.nv.hcl
$ nomad acl policy apply -description "Sensitive policy" -namespace default -job ARealJobThatExists sensitive /path/to/sensitive.policy.hcl

and then use it in a template:

job "ARealJobThatExists" {
  ...
  group "example" {
    ...
    task "example" {
      ...
      # Template to load sensitive properties into ENV VARs
      template {
        change_mode          = "restart"
        error_on_missing_key = true
        destination          = "${NOMAD_SECRETS_DIR}/.azure"
        data                 = <<EOT
{{- with nomadVar "azure/properties@system" -}}
PROP1 = {{ .prop1 }}
PROP2 = {{ .prop2 }}
{{- end -}}
EOT
      }
      ...
    }
  }
}

However, I would like to associate the same policy to multiple workloads (jobs) that don’t yet exist. In other words, I want to proactively associate the policy to a wildcard of jobs so that the new jobs (created by coworkers) will automatically have read access to existing secrets. The coworkers can list the secrets/variables but not read them.

I tried the following, without success:

$ nomad acl policy apply -description "Proactive sensitive policy" -namespace default -job "*" sensitive /path/to/sensitive.policy.hcl

Inspecting the policy returns the following:

$ nomad acl policy info sensitive
Name        = sensitive
Description = Proactive sensitive policy
CreateIndex = 12070
ModifyIndex = 12070

Associated Workload
Namespace = default
JobID     = *
Group     = <none>
Task      = <none>

I can think of 2 ways that this could work intuitively, but I assume that would need to be feature requested?

Option 1: Defining it in the policy
Something like…

namespace "sensitive" {
  variables {
    path "*" {
      capabilities = ["read"]

      # list the jobs that have access to the variables in this path and namespace
      jobs = ["*", "ARealJobThatExists"]

      # or maybe the jobs are in a different namespace
      jobs = ["*@default", "ARealJobThatExists@default"]

      # or maybe multiple job blocks (like having multiple path blocks)
      job "*" {
        namespace = "default"
      }

      job "ARealJobThatExists" {
        namespace = "default"
      }
    }

    path "azure/*" {
      capabilities = ["read"]
    }
  }
}

Option 2: Using a wildcard in the nomad acl policy apply command
Using a wildcard for the -job argument as per the examples above that only need to be executed once:

$ nomad acl policy apply -description "Proactive sensitive policy" -namespace default -job "*" sensitive /path/to/sensitive.policy.hcl
1 Like

We have a similar issue with periodic batch jobs, where their template wants to read nomad variables from a shared namespace and path (i.e. not using the “workload identity”).

The only way I could get the job to start was to apply a policy (with read permission to the variables’ path) to the exact job name after it had failed to start. Those job names happen to be randomly generated for periodic jobs, e.g. daily-maintenance/periodic-1690845060 where the final string of digits seems to be the invocation time.

I tried to apply the ACL policy to:

  • -job "daily-maintenance",
  • -job "daily-maintenance/*",
  • -job "*"

None of that worked.

The same variables are accessible by other service jobs via a policy applied to the job name. We would like to avoid duplicating the variables in “workload identity” paths. These variables are database and API credentials.

As it is currently possible to omit group and task when applying an ACL policy, wouldn’t it be nice and consistent to omit the job as well, thus applying the policy to all jobs, present and future?

Allowing access to all the jobs in the namespace would be ideal.

For now we got around this issue by changing the periodic jobs to batch and added an infinite loop within a bash script.

while true; do
  echo -e "Running periodic task at $(date)"
  # insert command here
  sleep 900
done

I’ve went down the Consul path in the meantime, and realised I could use Consul KV to work around the mentioned shortcoming. The job specification then looks like this:

job "ARealJobThatExists" {
  ...
  group "example" {
    ...
    task "example" {
      ...
      # Template to load sensitive properties into ENV VARs
      template {
        change_mode          = "restart"
        error_on_missing_key = true
        destination          = "${NOMAD_SECRETS_DIR}/.azure"
        data                 = <<EOT
{{- with $json := key "azure/properties" | parseJSON -}}
PROP1 = {{ $json.prop1 }}
PROP2 = {{ $json.prop2 }}
{{- end -}}
EOT
      }
      ...
    }
  }
}

Defining the KV in Consul is easy enough, so I’m not showing that part.