Vault Enterprise "required index state not present" consistency error

We’re running managed Vault on Hashicorp Cloud on a Standard Small cluster. We’ve started recently seeing errors returned to our client: HTTP 412 “required index state not present”. We’ve noticed that these errors appear to be highly clustered. We don’t see any errors for hours or days, then within a 5 minute period we see thousands.

Some Googling has told us that this is related to Vault’s eventual consistency model – if a request hits a reader node but the Vault state hasn’t been replicated from the writer to the reader, this error is returned.

Two points of confusion with this for us:

  1. My understanding is that this consistency check only occurs when the X-Vault-Index header is passed alongside a read request that depends on some state being present. This value is set to the X-Vault-Index header returned from a previous Vault write request, indicating that that state version isn’t present yet. We use node-vault, and from my reading of the code, there is no automatic propagation of that X-Vault-Index header from response to subsequent request, so I’m not sure where that value would be coming from.

  2. We aren’t currently using Vault as a secret store, we’re using Transit Engine as an encryption provider. We’re not regularly mutating Vault state, we’re simply asking Vault to encrypt some plaintext using a preexisting key, or asking Vault to decrypt some ciphertext using that preexisting key.

I suspect that this might have something to do with client auth (we’re using AppRole), since these client errors seem to be highly clustered together, and clustered around specific client hosts. Our theory is that when a client token expires, we refresh that token, and that introduces inconsistency with our Vault cluster.

I’m wondering what the best way to resolve this is. Should we block all other Vault operations client-side until we can validate that the client token has been properly refreshed?

Vault tokens also embed the Raft index that the cluster must be up to date with, to recognise the token. This may potentially explain why you see this immediately after obtaining a new token, even though you are not using the X-Vault-Index header.

I suggest your next step should be to instrument your application to measure how long after obtaining a new token these errors are occurring. This will give you some hard data to prove the hypothesis, and to take to HashiCorp support if need be, to ask why your cluster is performing poorly.

Separately, you may find it worth finding the part of the Vault docs comparing service tokens against batch tokens. Since batch tokens are cryptographic signed data that does not need to be looked up in storage to be validated, their validity will not be affected by slow eventual consistency.