HA is not really HA - vault could not elect a new master and failed - barrier init check failed

Vault version: 1.2.2
GKE cluster version 1.14

We had HA vault GKE cluster fail because a new master was not elected. We run private vault cluster with HA settings on GKE with 3 replicas as a statefulset deployment. To restore vault cluster we had to scale it to 0 and back to 3 - adding additional nodes or restarting them did not help.

This are the logs:

a. Error received on our application side:
“local node not active but active cluster node not found”

b. Vault side logs:
{“log”:“xxxxxT15:03:00.111Z [WARN] core: leadership lost, stopping active operation\n”,“stream”:“stderr”}

c. After looking at logs from days prior to vault failing, we saw this logs:

{“log”:“xxxxxxxT15:02:59.960Z [ERROR] core: barrier init check failed: error=“failed to check for initialization: failed to read object: Get https://www.googleapis.com/storage/v1/b//o?alt=json\u0026delimiter=%2F\u0026pageToken=\u0026prefix=core%2F\u0026prettyPrint=false\u0026projection=full\u0026versions=false: read tcp x.x.x.x:43102-\u003ex.x.x.x:443: read: connection reset by peer”\n”,“stream”:“stderr”,“time”:“xxxxxxT15:02:59.964582898Z”}

We were partially following this google’s guide when creating vault:

Few people confirmed our setup was configured OK.

We believe but not 100% sure this is a relevant issue:

Is it really Go language bug and as a result vault cant be 100% HA?
Did someone have similar problem and can share more details?
How to prevent this from happening again?



We also contacted google and they think its Go language problem too