Sporadic errors when accessing vault

I noticed some random errors on some applications hitting vault in the form of {"errors":["local node not active but active cluster node not found"]}.

I managed to reproduced it locally by running

▶ curl --request POST --data '{"jwt": "<my_jwt_token>", "role": "<my_role>"}' https://my_vault_instance:8813/v1/auth/kubernetes/login

repeatedly.

Out of say 4000 requests I had 4 errors as the above. Although it seems rare, in a production environment environment with multiple entities making requests at vault this can be an issue.

I have a 1 instance vault deployed on GCE using the official module. I have also HA enabled (based on CPU usage). At no point in time is the machine cpu-stressed more than 50%.

What can be causing this?

The error indicates that the single instance is sometimes deeming itself to not be active in the Vault HA management system.

You should post your Vault server configuration file so we can understand what configuration you are using, and review the Vault server logging - possibly turning up the logging level - to see if there are messages explaining why the single instance sometimes believes it can no longer maintain HA active status.

Could the fact that the (single) VM instance is behind a GCP load balancer be related?