Vault Transit Auto-Unseal Token Expiry — Best Practice for Automation + Alerting?

I’m running HashiCorp Vault HA (Raft) in Kubernetes, using Transit Auto-Unseal against a separate Vault cluster (vault-transit).

I started seeing repeated warnings like:

  • core.autoseal: seal wrapper health check failed

  • failed to encrypt test value

  • 403 permission denied

  • invalid token

After debugging, I confirmed the issue was that the transit seal token configured in the main Vault cluster had become invalid/expired.

What I found

  • Network connectivity to vault-transit is fine (I get a Vault 403, not a connection error)

  • The token in the seal "transit" stanza was no longer valid (vault token lookup returned bad token)

  • Creating a new token with the transit-unseal policy fixed the immediate issue

  • However, the token TTL was capped (effective max_ttl on token/ auth mount), so this can happen again later if not handled properly

Current setup

  • Main Vault: HA Raft on Kubernetes

  • Seal type: transit

  • Transit Vault: separate Vault cluster in another namespace

  • Token is currently configured directly in the seal "transit" block (Helm values)

What I’m looking for (best practice)

I want an intelligent/reliable way to handle this long-term, not just manual token rotation.

Specifically:

  1. What is the recommended pattern for Transit Auto-Unseal token management in Kubernetes?

    • Static long-lived token?

    • Periodic token + renewer?

    • token_file + Kubernetes Secret?

    • Vault Agent auto-auth to transit Vault?

  2. What’s the best way to alert proactively?

    • Alert on token TTL?

    • Alert on core.autoseal log failures?

    • Synthetic canary test (transit/encrypt/<key>)?

    • All of the above?

  3. If using a periodic token, what’s the preferred renewal method?

    • CronJob

    • Sidecar

    • Vault Agent

Goal

I want a setup where:

  • Auto-unseal keeps working across restarts

  • No manual monthly token rotation / Helm updates

  • Clear alerts before this becomes an outage

Would love to hear how others are running this in production.