Vault auto unseal with GCP and Kubernetes

I’m using this example to spin up the vault using raft and GCP KMS,

However, I’m getting the following on the pod logs:

{"@level":"info","@message":"proxy environment","@timestamp":"2022-10-12T09:12:48.050373Z","http_proxy":"","https_proxy":"","no_proxy":""}
{"@level":"info","@message":"Initializing versionTimestamps for core","@module":"core","@timestamp":"2022-10-12T09:12:48.186895Z"}
{"@level":"info","@message":"stored unseal keys supported, attempting fetch","@module":"core","@timestamp":"2022-10-12T09:12:48.187698Z"}
{"@level":"warn","@message":"failed to unseal core","@timestamp":"2022-10-12T09:12:48.187803Z","error":"stored unseal keys are supported, but none were found"}
{"@level":"info","@message":"security barrier not initialized","@module":"core","@timestamp":"2022-10-12T09:12:52.422247Z"}
{"@level":"info","@message":"seal configuration missing, but cannot check old path as core is sealed","@module":"core.autoseal","@timestamp":"2022-10-12T09:12:52.422302Z","seal_type":"recovery"}
{"@level":"info","@message":"stored unseal keys supported, attempting fetch","@module":"core","@timestamp":"2022-10-12T09:12:53.188531Z"}
{"@level":"warn","@message":"failed to unseal core","@timestamp":"2022-10-12T09:12:53.188651Z","error":"stored unseal keys are supported, but none were found"}
{"@level":"info","@message":"security barrier not initialized","@module":"core","@timestamp":"2022-10-12T09:12:57.420214Z"}
{"@level":"info","@message":"seal configuration missing, but cannot check old path as core is sealed","@module":"core.autoseal","@timestamp":"2022-10-12T09:12:57.420279Z","seal_type":"recovery"}
{"@level":"info","@message":"stored unseal keys supported, attempting fetch","@module":"core","@timestamp":"2022-10-12T09:12:58.188778Z"}
{"@level":"warn","@message":"failed to unseal core","@timestamp":"2022-10-12T09:12:58.188885Z","error":"stored unseal keys are supported, but none were found"}
{"@level":"info","@message":"security barrier not initialized","@module":"core","@timestamp":"2022-10-12T09:13:02.426477Z"}
{"@level":"info","@message":"seal configuration missing, but cannot check old path as core is sealed","@module":"core.autoseal","@timestamp":"2022-10-12T09:13:02.426531Z","seal_type":"recovery"}
{"@level":"info","@message":"stored unseal keys supported, attempting fetch","@module":"core","@timestamp":"2022-10-12T09:13:03.189578Z"}
{"@level":"warn","@message":"failed to unseal core","@timestamp":"2022-10-12T09:13:03.189711Z","error":"stored unseal keys are supported, but none were found"}
{"@level":"info","@message":"security barrier not initialized","@module":"core","@timestamp":"2022-10-12T09:13:07.423050Z"}
{"@level":"info","@message":"seal configuration missing, but cannot check old path as core is sealed","@module":"core.autoseal","@timestamp":"2022-10-12T09:13:07.423107Z","seal_type":"recovery"}

Here is the current status:

kubectl exec -ti vault-restore-0 -n vault-restore -- vault status
Key                      Value
---                      -----
Recovery Seal Type       gcpckms
Initialized              false
Sealed                   true
Total Recovery Shares    0
Threshold                0
Unseal Progress          0/0
Unseal Nonce             n/a
Version                  1.10.3
Storage Type             raft
HA Enabled               true

If I run vault operator init, I got the recovery keys but the Recovery Seal Type turns to shamir, see:

Key                      Value
---                      -----
Recovery Seal Type       shamir
Initialized              true
Sealed                   false
Total Recovery Shares    5
Threshold                3
Version                  1.10.3
Storage Type             raft
Cluster Name             vault-cluster-6ef10284
Cluster ID               9be1df17-9751-c5e8-ce66-7c7186b81541
HA Enabled               true
HA Cluster               https://vault-restore-0.vault-restore-internal:8201
HA Mode                  active
Active Since             2022-10-12T09:16:27.359629529Z
Raft Committed Index     37
Raft Applied Index       37

Is it normal behavior? should it be gcpckms instead?

Also, I’m running 3 pods, should I run vault operator init against all the pods? should it be a cluster level and be applied in all the pods automatically? You can see that the only pod that is running is the one that I ran vault operator init, while the others are still throwing the error I shared above,

kubectl get pods -n vault-restore
NAME                        READY   STATUS    RESTARTS       AGE
vault-restore-0             1/1     Running   3 (2m5s ago)   5m53s
vault-restore-1             0/1     Running   4 (57s ago)    5m53s
vault-restore-2             0/1     Running   4 (61s ago)    5m53s

It feels like each pod is an independent vault cluster.

I ran the diagnose against the pod and I got the following:

Results:
[ failure ] Vault Diagnose
  [ success ] Check Operating System
    [ success ] Check Open File Limits: Open file limits are set to 1048576.
    [ success ] Check Disk Usage: /vault/data usage ok.
    [ success ] Check Disk Usage: /vault/config usage ok.
    [ success ] Check Disk Usage: /etc/resolv.conf usage ok.
    [ success ] Check Disk Usage: /home/vault usage ok.
    [ success ] Check Disk Usage: /vault/file usage ok.
    [ success ] Check Disk Usage: /etc/hosts usage ok.
    [ success ] Check Disk Usage: /dev/termination-log usage ok.
    [ success ] Check Disk Usage: /etc/hostname usage ok.
    [ success ] Check Disk Usage: /vault/logs usage ok.
  [ success ] Parse Configuration
  [ warning ] Check Telemetry: Telemetry is using default configuration
    By default only Prometheus and JSON metrics are available.  Ignore this warning if you are using telemetry or are using these metrics and are satisfied with the default retention time and gauge period.
  [ failure ] Check Storage: Diagnose could not initialize storage backend.
    [ failure ] Create Storage Backend: Error initializing storage of type raft: failed to create fsm: failed to open bolt file: timeout
command terminated with exit code 4

Something to bear in mind for future posts - these logs would be a lot more readable if you didn’t configure them to JSON format.

I’m not totally sure, but this is at least plausibly OK, as the recovery keys that are created along with an auto-unseal setup are shamir based.

No, if you do that, you end up with 3 single-node Vault clusters.

Once one node of a Vault Integrated Storage cluster has been inited, the rest need to be joined to it - either using vault operator raft join, or via a retry_join config in the config file. There’s no mention of retry_join being attempted in the logs you posted, so my guess is you either need to manually run join commands, or else add or change retry_join config.

Diagnose’s storage check needs to be run with the Vault server not running, as otherwise the running server has it locked.