Vault reverts unsealing when starting before Consul

Hi All,
We have Vault with a Consul Backend, installed using the helm charts on kubernetes. It is configured to auto unseal with a key from AWS Secrets Manager.

Recently, we had an incident where the on-premise hosts were restarted in our Beta environment, and vault and consul were affected, and basically restarted forcibly as well.

When they came back, vault unsealed with the key from AWS, but according to the logs, it seemed that it couldn’t get the metadata from Consul, and reverted the unsealing process and went back to sealed. When I went to troubleshoot the situation, I found Vault still sealed, and unable to recovered, and Consul up and running with on issues.

So, I assumed that what happened was that Consul wasn’t totally up when Vault was still trying to unseal, so vault exhausted the retries and went back to sealed and did not recover. Then, Consul finally came back up.

What I did was just restart vault and the problem went away.

Because of this, I was about to attempt to setup some init containers in the vault pods that won’t let the pod start until Consul is up and running. However, before doing that, I wanted to see if somebody had seen this before, and if they did something to mitigate it.

Any idea is appreciated, thank you.

This is still biting us.

On a similar scenario, when Consul is oomkilled or restarted for whatever reason, and Vault is unable to connect to it, it gets sealed and can’t recover after that. Consul then comes back up fine, but Vault doesn’t recover. We have to manually recreate the Vault pods that are not ready to get it to work.

Any idea what could be happening?

I would suggest opening a GitHub issue on Vault, and asking them to remove the behaviour of automatically sealing if storage cannot be contacted for a while.

1 Like