I have done HA cluster setup with 3 consul nodes and 2 vault nodes. Vault 1 active and another one standby and ACL has been enabled. My requirement is here, when my active vault node goes down, standby node have to become as a active node. But it is failing with local node not active but active cluster node not found. Can somebody help on this?
Does this issue eventually resolve itself, or is it persistent until some action is taken? When the active node goes down suddenly, the HA mechanism waits for a bit before performing lock acquisition.
Can you provide the Vault version that you’re running as well?
Thanks for your reply. I fixed this issue by making standby node configuration correctly. I did a mistake with log file permission, once I corrected that everything seems works as expect.
Weeks ago, we started to have this issue as well after long time running Vault with no problems related to that. In the last 20 days it happened 3 times (usually in similar week day/time).
We run version 1.3.1 and to recover the system we must kill one of Vault pods. We run a HA setup with 2 pods and DynamoDB as backend.
What changed recently in the last weeks is that we enabled OIDC as backend mechanism and it’s working fine. We also redefined some policies for some users. Both changes seems to not be related since we have authentication and authorization apparently working fine.
Any suggestion for places to look at in our setup?