How to restart a Nomad job upon Vault key change?

Is it possible to restart a Nomad job when one or more Vault keys change? I would like to store Let’s Encrypt certificates and keys in Vault, and have the Nomad jobs that depend on these restart automatically whenever the certificates and keys change.

I found a GitHub issue (https://github.com/hashicorp/nomad/issues/5052) where someone asks essentially the same question, and one person answered that “Nomad client agents (via consul-template) watch for changes”. However, I tried it out and I cannot get it to work. I can’t tell if I’m misunderstanding how it all works or if I’m just doing something wrong.

Here is how I tested:

I have a some keys defined in the secret/infra/change-test path in Vault: key test1 and key test2. I also have the following defined in my job spec file:

template {
  data = <<EOF
  {{ with secret "secret/infra/change-test" }}TEST1={{ .Data.test1 }}{{ end }}
  EOF
  destination = "secrets/file.env"
  env         = true
}

template {
  data = <<EOF
  {{ with secret "secret/infra/change-test" }}{{ .Data.test2 }}{{ end }}
  EOF
  destination = "local/test"
}

When I start the job it downloads and runs a small program that echoes the value of the TEST1 environment variable, reads and outputs the content of the local/test file, sleeps ten seconds and then goes through the sequence again.

Now, based on answers in #5052 I expected that if I changed either the value of test1 or test2 in Vault, Nomad would register the change and restart the task based on the default value of the change_mod parameter, which is "restart". However, the task is not restarting, and the test1 environment variable value and the contents of the local/test file do not change either.

Am I misunderstanding how this is supposed to work? If so, is it at all possible for Vault key changes to be reflected in a Nomad template at runtime?

Thanks,
-Martin

What you are saying should be working exactly as you expect.
(I had setup the exact same (similar) demo to show how “magical” Nomad is! :grinning: )

What is the TTL of the secret in Vault?

If TTL is 0 in Vault, I have seen this to behave as you are currently observing.

FWIW, the Vault web gui misbehaves when setting the parameters via webgui:
ref: https://github.com/hashicorp/vault/issues/9333

1 Like

@shantanugadgil, you provided the exact piece of information I was missing: TTLs for secrets.

I’ll be honest: while I did read about them when I started playing with Vault, since the docs indicated that it was only a suggestion and because of the fact that I didn’t have a use case for the concept back then I quickly forgot about them. Compounded with the fact that I kind of assumed that consul-template was using some kind of notification-based “watcher” (like Consul) it never dawned on me that the TTL value of a Vault secret is what was being used to refresh the data.

I checked the secret/infra/change-test path in our Vault, using both the CLI:

❯ vault read secret/infra/change-test
Key                 Value
---                 -----
refresh_interval    768h
test1               crescentmoon
test2               lollipop

and the API:

❯ curl --header "X-Vault-Token: `cat ~/.vault-token`" https://127.0.0.1:8200/v1/secret/infra/change-test
{"request_id":"c92a97f0-96bc-f9b4-eee5-b2be9044159a","lease_id":"","renewable":false,"lease_duration":2764800,"data":{"test1":"crescentmoon","test2":"lollipop"},"wrap_info":null,"warnings":null,"auth":null}

and suddenly understood what you meant.

So I set a new key (ttl) with a value of 30s and re-checked the CLI and the API:

❯ vault read secret/infra/change-test
Key                 Value
---                 -----
refresh_interval    30s
test1               crescentmoon
test2               lollipop
ttl                 30
❯ curl --header "X-Vault-Token: `cat ~/.vault-token`" https://127.0.0.1:8200/v1/secret/infra/change-test
{"request_id":"e14bbd75-44e5-6ad6-a59b-f19f7a590932","lease_id":"","renewable":false,"lease_duration":30,"data":{"test1":"crescentmoon","test2":"lollipop","ttl":"30"},"wrap_info":null,"warnings":null,"auth":null}

I then restarted the Nomad job, saw the initial values being outputted in the logs, when in Vault, changed the values of both test1 and test2, and lo and behold, within 30 seconds the Nomad task restarted and began showing the new values.

So now the whole thing is crystal clear and I know exactly what to do in order to use Let’s Encrypt certificates and keys for my Nomad jobs.

Thanks a lot!
-Martin

2 Likes

Glad it worked out! :+1: