Cannot unseal node

Just starting with vault.
I’ve setup a 3 VM node cluster with Integrated storage on a SAN shared disk (Ext4) for /opt/vault/data and manual unseal.
The first two VMs have joined the cluster and node1 is the active one.

[root@vault1 ~]# vault status
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            5
Threshold               3
Version                 1.6.1
Storage Type            raft
Cluster Name            vault-cluster-31a5700d
Cluster ID              xx
HA Enabled              true
HA Cluster              https://x.x.x.1:8201
HA Mode                 active
Raft Committed Index    12521
Raft Applied Index      12521

[root@vault2 ~]# vault status
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            5
Threshold               3
Version                 1.6.1
Storage Type            raft
Cluster Name            vault-cluster-31a5700d
Cluster ID              xxxxx
HA Enabled              true
HA Cluster              https://x.x.x.1:8201
HA Mode                 standby
Active Node Address     http://x.x.x.1:8200
Raft Committed Index    48
Raft Applied Index      38

The problem is with the unsealing of the third node. When i’m entering the last “vault operator unseal” i get:
Error unsealing: Put “http://x.x.x.3:8200/v1/sys/unseal”: EOF

and a stack trace on syslog:

Jan 14 17:01:07 vault3 vault: 2021-01-14T17:01:07.771+0200 [DEBUG] core: unseal key supplied: migrate=false
Jan 14 17:01:07 vault3 vault: 2021-01-14T17:01:07.771+0200 [DEBUG] core: starting cluster listeners
Jan 14 17:01:07 vault3 vault: 2021-01-14T17:01:07.771+0200 [INFO]  core.cluster-listener.tcp: starting listener: listener_address=x.x.x.3:8201
Jan 14 17:01:07 vault3 vault: 2021-01-14T17:01:07.771+0200 [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=x.x.x.3:8201
Jan 14 17:01:07 vault3 vault: 2021-01-14T17:01:07.774+0200 [ERROR] storage.raft: failed to get log: index=81 error="log not found"
Jan 14 17:01:07 vault3 vault: 2021-01-14T17:01:07.774+0200 [INFO]  http: panic serving x.x.x.3:53150: log not found
Jan 14 17:01:07 vault3 vault: goroutine 38 [running]:
Jan 14 17:01:07 vault3 vault: net/http.(*conn).serve.func1(0xc0001cc0a0)
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:1801 +0x147
Jan 14 17:01:07 vault3 vault: panic(0x44ef160, 0xc000629570)
Jan 14 17:01:07 vault3 vault: /goroot/src/runtime/panic.go:975 +0x47a
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/vendor/github.com/hashicorp/raft.NewRaft(0xc0005a3158, 0x5806b40, 0xc000a024c0, 0x5857b40, 0xc000a1c3c0, 0x5843e80, 0xc000a02a20, 0x5803f40, 0xc000a01b90, 0x58779c0, ...)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/vendor/github.com/hashicorp/raft/api.go:545 +0x1485
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/physical/raft.(*RaftBackend).SetupCluster(0xc000017420, 0x5843440, 0xc000050058, 0xc000aac780, 0x585bec0, 0xc0004118c0, 0x0, 0x0, 0x0, 0x0)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/physical/raft/raft.go:735 +0x73a
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/vault.(*Core).startRaftBackend(0xc000a5c000, 0x5843440, 0xc000050058, 0x0, 0x0)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/vault/raft.go:164 +0x398
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/vault.(*Core).unsealInternal(0xc000a5c000, 0x5843440, 0xc000050058, 0xc0000ea3c0, 0x20, 0x21, 0x20, 0x20)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/vault/core.go:1455 +0x114
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/vault.(*Core).unsealFragment(0xc000a5c000, 0xc00040e0a0, 0x21, 0x50, 0x0, 0x0, 0x0)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/vault/core.go:1132 +0x573
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/vault.(*Core).Unseal(...)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/vault/core.go:1027
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/http.handleSysUnseal.func1(0x5818b00, 0xc000017880, 0xc000315700)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/http/sys_seal.go:132 +0x5da
Jan 14 17:01:07 vault3 vault: net/http.HandlerFunc.ServeHTTP(0xc000b00420, 0x5818b00, 0xc000017880, 0xc000315700)
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:2042 +0x44
Jan 14 17:01:07 vault3 vault: net/http.(*ServeMux).ServeHTTP(0xc000a1d080, 0x5818b00, 0xc000017880, 0xc000315700)
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:2417 +0x1ad
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/http.wrapHelpHandler.func1(0x5818b00, 0xc000017880, 0xc000315700)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/http/help.go:24 +0x14a
Jan 14 17:01:07 vault3 vault: net/http.HandlerFunc.ServeHTTP(0xc000b02cc0, 0x5818b00, 0xc000017880, 0xc000315700)
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:2042 +0x44
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/http.wrapCORSHandler.func1(0x5818b00, 0xc000017880, 0xc000315700)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/http/cors.go:29 +0x91e
Jan 14 17:01:07 vault3 vault: net/http.HandlerFunc.ServeHTTP(0xc000b02ce0, 0x5818b00, 0xc000017880, 0xc000315700)
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:2042 +0x44
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/http.rateLimitQuotaWrapping.func1(0x5818b00, 0xc000017880, 0xc000315700)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/http/util.go:96 +0xa95
Jan 14 17:01:07 vault3 vault: net/http.HandlerFunc.ServeHTTP(0xc000b02d00, 0x5818b00, 0xc000017880, 0xc000315700)
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:2042 +0x44
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/http.wrapGenericHandler.func1(0x5818b00, 0xc000017880, 0xc000315500)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/http/handler.go:315 +0x4a4
Jan 14 17:01:07 vault3 vault: net/http.HandlerFunc.ServeHTTP(0xc000a5b740, 0x5818b00, 0xc000017880, 0xc000315500)
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:2042 +0x44
Jan 14 17:01:07 vault3 vault: github.com/hashicorp/vault/vendor/github.com/hashicorp/go-cleanhttp.PrintablePathCheckHandler.func1(0x5818b00, 0xc000017880, 0xc000315500)
Jan 14 17:01:07 vault3 vault: /gopath/src/github.com/hashicorp/vault/vendor/github.com/hashicorp/go-cleanhttp/handlers.go:42 +0xba
Jan 14 17:01:07 vault3 vault: net/http.HandlerFunc.ServeHTTP(0xc000b02d20, 0x5818b00, 0xc000017880, 0xc000315500)
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:2042 +0x44
Jan 14 17:01:07 vault3 vault: net/http.serverHandler.ServeHTTP(0xc000017960, 0x5818b00, 0xc000017880, 0xc000315500)
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:2843 +0xa3
Jan 14 17:01:07 vault3 vault: net/http.(*conn).serve(0xc0001cc0a0, 0x5843400, 0xc000aac000)
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:1925 +0x8ad
Jan 14 17:01:07 vault3 vault: created by net/http.(*Server).Serve
Jan 14 17:01:07 vault3 vault: /goroot/src/net/http/server.go:2969 +0x36c

And the status remain sealed.

[root@vault3 ~]# vault status
Key                Value
---                -----
Seal Type          shamir
Initialized        true
Sealed             true
Total Shares       5
Threshold          3
Unseal Progress    0/3
Unseal Nonce       n/a
Version            1.6.1
Storage Type       raft
HA Enabled         true

Vault Config

Node #1
disable_mlock = true
storage “raft” {
path = “/opt/vault/data”
node_id = “vault_node_1”
}
listener “tcp” {
address = “x.x.x.1:8200”
cluster_address = “x.x.x.1:8201”
tls_disable = true
}
api_addr = “http://x.x.x.1:8200
cluster_addr = “http://x.x.x.1:8201

Node #2
disable_mlock = true
storage “raft” {
path = “/opt/vault/data”
node_id = “vault_node_2”
}
listener “tcp” {
address = “x.x.x.2:8200”
cluster_address = “x.x.x.2:8201”
tls_disable = true
}
api_addr = “http://x.x.x.2:8200
cluster_addr = “http://x.x.x.2:8201

Node #3
disable_mlock = true
storage “raft” {
path = “/opt/vault/data”
node_id = “vault_node_3”
}
listener “tcp” {
address = “x.x.x.3:8200”
cluster_address = “x.x.x.3:8201”
tls_disable = true
}
api_addr = “http://x.x.x.3:8200
cluster_addr = “http://x.x.x.3:8201

Are your HTTP vs HTTPS in the *_addr’s accurate?
HA Cluster https://x.x.x.1:8201

Thank you mikegreen for your quick reply.

I see https on ‘vault status’, but there is no https in my vault.hcl. I suppose vault overrides my configuration?
I also forgot to mention that i see to many DEBUG/WARN entries in my logs like the following:

Jan 14 19:38:26 vault1 vault: 2021-01-14T19:38:26.826+0200 [DEBUG] core.cluster-listener: performing server cert lookup
Jan 14 19:38:26 vault1 vault: 2021-01-14T19:38:26.836+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="remote error: tls: bad certificate"
Jan 14 19:38:48 vault1 vault: 2021-01-14T19:38:48.799+0200 [WARN]  core.raft: skipping new raft TLS config creation, keys are pending

So if /opt/vault/data is from a SAN, does that mean each node sees the same data at that mount point? If so that’s a misconfiguration: Raft Integrated Storage expects each node to have its own local storage. If you want the SAN to manage replication, you could try using the File storage engine, in conjunction with some other HA storage engine (possibly raft) to handle locking. Though that sounds more complicated, if it were me I’d take the SAN out of the equation altogether.

1 Like

Correct if i’ m wrong, i saw two setups for on-premises without consul.

HA Cluster with Integrated storage
Nodes with local disks, “simple” filesystem for data and raft storage setup for cluster coordination.

Integrated Storage for HA Coordination
Nodes with shared disk, “simple” filesystem and raft storage for data

I followed the second option. What i didn’t understood from the documentation is how the cluster nodes could write on the same shared filesystem without some locking mechanism. The documentation is not mentioning NFS or cluster filesystem (ex gfs2, ocfs2).
I assumed that since only one is active, only one is writing so there should be no problem.

The second option suggests using a config like this:

ha_storage "raft" {
  path    = "$demo_home/ha-raft_1/"
  node_id = "vault_1"
}

storage "file" {
  path = "$demo_home/vault-storage-file/"
}

I assumed that since only one is active, only one is writing so there should be no problem.

How do you ensure only one is active? You need a distributed lock or something like it to ensure that.

1 Like

Thank you ncabatoff for your reply.

I thought that ha_storage stranza is used to overcome “file” storage limitations and from 1.5 version i could use “raft” storage since it is “HA ready”.
My bad

Thank you again.