How to restart a Nomad job upon Vault key change?

Is it possible to restart a Nomad job when one or more Vault keys change? I would like to store Let’s Encrypt certificates and keys in Vault, and have the Nomad jobs that depend on these restart automatically whenever the certificates and keys change.

I found a GitHub issue (https://github.com/hashicorp/nomad/issues/5052) where someone asks essentially the same question, and one person answered that “Nomad client agents (via consul-template) watch for changes”. However, I tried it out and I cannot get it to work. I can’t tell if I’m misunderstanding how it all works or if I’m just doing something wrong.

Here is how I tested:

I have a some keys defined in the secret/infra/change-test path in Vault: key test1 and key test2. I also have the following defined in my job spec file:

template {
  data = <<EOF
  {{ with secret "secret/infra/change-test" }}TEST1={{ .Data.test1 }}{{ end }}
  EOF
  destination = "secrets/file.env"
  env         = true
}

template {
  data = <<EOF
  {{ with secret "secret/infra/change-test" }}{{ .Data.test2 }}{{ end }}
  EOF
  destination = "local/test"
}

When I start the job it downloads and runs a small program that echoes the value of the TEST1 environment variable, reads and outputs the content of the local/test file, sleeps ten seconds and then goes through the sequence again.

Now, based on answers in #5052 I expected that if I changed either the value of test1 or test2 in Vault, Nomad would register the change and restart the task based on the default value of the change_mod parameter, which is "restart". However, the task is not restarting, and the test1 environment variable value and the contents of the local/test file do not change either.

Am I misunderstanding how this is supposed to work? If so, is it at all possible for Vault key changes to be reflected in a Nomad template at runtime?

Thanks,
-Martin

What you are saying should be working exactly as you expect.
(I had setup the exact same (similar) demo to show how “magical” Nomad is! :grinning: )

What is the TTL of the secret in Vault?

If TTL is 0 in Vault, I have seen this to behave as you are currently observing.

FWIW, the Vault web gui misbehaves when setting the parameters via webgui:
ref: https://github.com/hashicorp/vault/issues/9333

2 Likes

@shantanugadgil, you provided the exact piece of information I was missing: TTLs for secrets.

I’ll be honest: while I did read about them when I started playing with Vault, since the docs indicated that it was only a suggestion and because of the fact that I didn’t have a use case for the concept back then I quickly forgot about them. Compounded with the fact that I kind of assumed that consul-template was using some kind of notification-based “watcher” (like Consul) it never dawned on me that the TTL value of a Vault secret is what was being used to refresh the data.

I checked the secret/infra/change-test path in our Vault, using both the CLI:

❯ vault read secret/infra/change-test
Key                 Value
---                 -----
refresh_interval    768h
test1               crescentmoon
test2               lollipop

and the API:

❯ curl --header "X-Vault-Token: `cat ~/.vault-token`" https://127.0.0.1:8200/v1/secret/infra/change-test
{"request_id":"c92a97f0-96bc-f9b4-eee5-b2be9044159a","lease_id":"","renewable":false,"lease_duration":2764800,"data":{"test1":"crescentmoon","test2":"lollipop"},"wrap_info":null,"warnings":null,"auth":null}

and suddenly understood what you meant.

So I set a new key (ttl) with a value of 30s and re-checked the CLI and the API:

❯ vault read secret/infra/change-test
Key                 Value
---                 -----
refresh_interval    30s
test1               crescentmoon
test2               lollipop
ttl                 30
❯ curl --header "X-Vault-Token: `cat ~/.vault-token`" https://127.0.0.1:8200/v1/secret/infra/change-test
{"request_id":"e14bbd75-44e5-6ad6-a59b-f19f7a590932","lease_id":"","renewable":false,"lease_duration":30,"data":{"test1":"crescentmoon","test2":"lollipop","ttl":"30"},"wrap_info":null,"warnings":null,"auth":null}

I then restarted the Nomad job, saw the initial values being outputted in the logs, when in Vault, changed the values of both test1 and test2, and lo and behold, within 30 seconds the Nomad task restarted and began showing the new values.

So now the whole thing is crystal clear and I know exactly what to do in order to use Let’s Encrypt certificates and keys for my Nomad jobs.

Thanks a lot!
-Martin

3 Likes

Glad it worked out! :+1:

@radcool was this with a k/v v2 store?
I am currently running into that it seems I can not seam to tweak the “watcher” time.

For me, I always have to wait 5 minutes for the Vault changes to be picked up by Nomad. I know that is the magical default 300s that appear in Vault in various places, but I can’t seem to find out which one I should alter to allow for a shorter time (30 seconds would be nice).

vault read sys/mounts/red-envy/tune
Key                  Value
---                  -----
default_lease_ttl    30s
description          n/a
force_no_cache       false
max_lease_ttl        768h
options              map[version:2]

I already set the default_lease_ttl to 30s but that didn’t help.
When I get the secret from the Vault it does not specify a refresh_interval like yours does.

From the documentation I get that it sort of works with lease_duration but that I cannot set to either a secret or an engine.

vault kv get red-envy/test/env/stage
======= Metadata =======
Key                Value
---                -----
created_time       2022-02-21T12:29:42.121712327Z
custom_metadata    <nil>
deletion_time      n/a
destroyed          false
version            1

==== Data ====
Key      Value
---      -----
value    test
ttl      10s

Any thoughts anyone?

Edit: I didn’t see this was a 2 year old topic, sorry, but it still might be relevant.