We have Vault with a Consul Backend, installed using the helm charts on kubernetes. It is configured to auto unseal with a key from AWS Secrets Manager.
Recently, we had an incident where the on-premise hosts were restarted in our Beta environment, and vault and consul were affected, and basically restarted forcibly as well.
When they came back, vault unsealed with the key from AWS, but according to the logs, it seemed that it couldn’t get the metadata from Consul, and reverted the unsealing process and went back to sealed. When I went to troubleshoot the situation, I found Vault still sealed, and unable to recovered, and Consul up and running with on issues.
So, I assumed that what happened was that Consul wasn’t totally up when Vault was still trying to unseal, so vault exhausted the retries and went back to sealed and did not recover. Then, Consul finally came back up.
What I did was just restart vault and the problem went away.
Because of this, I was about to attempt to setup some init containers in the vault pods that won’t let the pod start until Consul is up and running. However, before doing that, I wanted to see if somebody had seen this before, and if they did something to mitigate it.
Any idea is appreciated, thank you.