Vault confusion on max token TTL in Nomad job

Hello, I have an example of a vault token that is confusing me because it seems to be living beyond what should be its max TTL. The token was issued by Nomad using the vault stanza.

I don’t think the versioning is too pertinent to my confusion here, but FWIW, versioning is: vault 1.11.0, nomad 1.3.2, consul 1.11.2.

According to the vault token doc the only tokens with unlimited life are root tokens and periodic tokens.

The token in question is not a root token and a periodic token type shows a period: $value line in its vault token lookup output, which this token does not show, so it should not be a periodic token either.

According to the vault TTL token description document, any non-root and non-periodic tokens should have a max life TTL based on a combination of:

  1. system max ttl (ie: max_lease_ttl)
  2. max ttl on a mount generating tokens, which can be higher than system max ttl
  3. A value suggested by the auth method that issued the token which can only be smaller than the system max ttl

For the token and environment in question, the max system TTL is default at 32 days (768h) for case (1) above:

❯ vault read sys/auth/token/tune
Key                  Value
---                  -----
default_lease_ttl    768h
description          token based credentials
force_no_cache       false
max_lease_ttl        768h
token_type           default-service

For case (2) above, the path shown in the token lookup output below is auth/token/create/nomad-cluster which is not associated with a specific auth method seen in the output of a call to the v1/sys/mounts API and which may have a larger max_lease_ttl larger than the system max_lease_ttl. So the system max ttl of 32 days should still be the limit as far as I can tell.

For case (3) the only modification that could be done would be to reduce the effective max ttl, so again, we can still consider the system max ttl as the max ttl limit that should apply here since there are no such reductions suggested by the auth method.

As I understand this then, we should have a token which according to the Nomad docs for the vault stanza, Nomad will automatically renew and rotate to a new token in the case that the token expires.

So how can we then have a token that has continued renewing for 2 months when it seems the max ttl (max_lease_ttl) should be 32 days?

I must be misunderstanding something.

Here is the vault token lookup output of the token in question:

Key                  Value
---                  -----
accessor             <snip>
creation_time        1662166592
creation_ttl         72h
display_name         token-<snip>
entity_id            n/a
expire_time          2022-11-06T09:37:13.108605415Z
explicit_max_ttl     0s
id                   <snip>
issue_time           2022-09-03T00:56:32.117319532Z
last_renewal         2022-11-03T09:37:13.108605525Z
last_renewal_time    1667468233
meta                 map[AllocationID:<snip>]
num_uses             0
orphan               true
path                 auth/token/create/nomad-cluster
policies             [default $JOB_POLICY nomad-cluster]
renewable            true
role                 nomad-cluster
ttl                  67h26m12s
type                 service

TIA for any clarity anyone can offer :slight_smile:
John

Um. That’s weird.

One technique which might be useful to shed a bit more light on things, might be to ask Vault to renew the token until a very very long time in the future:

vault token renew -i 99999h

Resulting in Vault telling you exactly what actual maximum ttl it is applying to this token:

WARNING! The following warnings were returned from Vault:

  * TTL of "99999h" exceeded the effective max_ttl of "1439h59m56s"; TTL value
  is capped accordingly

That way, you’ll at least find out what number it is working with.

1 Like

Ah, nice idea, thank you… Trying it, though, and it just happily accepts whatever interval I give it, printing out:

# vault token renew -increment=99999h
Key                        Value
---                        -----
token                      <snip>
token_accessor             <snip>
token_duration             72h
token_renewable            true
token_policies             ["default" $JOB_POLICY "nomad-cluster"]
identity_policies          []
policies                   ["default" $JOB_POLICY "nomad-cluster"]
token_meta_NodeID          d5425927-48ff-91cf-fb12-c58c6e40fc1f
token_meta_Task            $TASK
token_meta_TaskGroup       $TASKGROUP
token_meta_AllocationID    8bf14fd5-6416-6ee6-07f8-1b19e7240af2
token_meta_JobID           $JOB
token_meta_Namespace       n/a

Doing a token lookup again after this command does indeed show the last_renewal, last_renewal_time, and expire_time fields have been updated with everything else looking the same at a glance.

Maybe this is a bug. This is the only example I’ve seen like this so far, but I haven’t looked hard either.

It seems to be acting like a periodic token with no expiry as long as renewal occurs, but I see no indication it’s actually periodic.

Perhaps try

vault read auth/token/roles/nomad-cluster

?

1 Like

Ah, so maybe that explains it!

❯ vault read auth/token/roles/nomad-cluster
Key                         Value
---                         -----
allowed_entity_aliases      <nil>
allowed_policies            []
allowed_policies_glob       <nil>
disallowed_policies         [admin client core nomad-server]
disallowed_policies_glob    <nil>
explicit_max_ttl            0s
name                        nomad-cluster
orphan                      true
path_suffix                 n/a
period                      0s
renewable                   true
token_explicit_max_ttl      0s
token_no_default_policy     false
token_period                72h               # <-------- !!!
token_type                  default-service

I was looking specifically for period as an indicator of being a periodic token as I have seen that before when doing a vault token lookup on some periodic tokens, but per the auth api docs, token_period does essentially the same thing and just doesn’t show up in the vault token lookup output in a similar way. I think this has been the source of my confusion. I wonder why that isn’t included in the token lookup info.

Thanks for the ideas!