Hi @maxb,
thanks for the confirmation! Was already about to give up.
I followed vaultproject.io for the migration from file storage, and it seemed to work fine (“Success! All of the keys have been migrated.”). But now, when I start vault server
and try to unseal, when I finish the last unseal step (3/3 in my setup), the vault operator unseal
command just hangs.
==> Vault server configuration:
Api Address: https://FQDN:12345
Cgo: disabled
Cluster Address: https://FQDN.:12345
Go Version: go1.16.5
Listener 1: tcp (addr: "0.0.0.0:9200", cluster address: "0.0.0.0:9201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
Log Level: info
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: raft (HA available)
Version: Vault v1.8.0
Version Sha: 82a99f14eb6133f99a975e653d4dac21c17505c7
==> Vault server started! Log data will stream in below:
2022-03-25T10:48:58.535Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy=""
2022-03-25T10:51:25.661Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:9201
2022-03-25T10:51:25.661Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:9201
2022-03-25T10:51:25.710Z [INFO] storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:"node_1", NotifyCh:(chan<- bool)(0xc000f4eaf0), LogOutput:io.Writer(nil), LogLevel:"DEBUG", Logger:(*hclog.interceptLogger)(0xc000a3a090), NoSnapshotRestoreOnStart:true, skipStartup:false}"
2022-03-25T10:51:25.913Z [INFO] storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault_tf_raft_node_1 Address:FQDN:12345}]"
2022-03-25T10:51:25.913Z [INFO] storage.raft: entering follower state: follower="Node at FQDN.:12345 [Follower]" leader=
2022-03-25T10:51:31.398Z [WARN] storage.raft: not part of stable configuration, aborting election
^C==> Vault shutdown triggered
$ vault operator unseal
Unseal Key (will be hidden):
Error unsealing: context deadline exceeded
$
My current start script:
VAULT_LOCAL_CONFIG=$(cat <<-EOF
{
"ui": true,
"cluster_name": "Some great Name",
"listener": {
"tcp": {
"address": "0.0.0.0:9200",
"tls_disable": 0,
"tls_min_version": "tls11",
"tls_disable_client_certs": "true",
"tls_require_and_verify_client_cert": "false",
"tls_cert_file": "/vault/certs/vault.crt",
"tls_key_file": "/vault/certs/vault.key"
}
},
"storage": {
"raft": {
"path": "/vault/raft",
"node_id": "node_1"
}
},
"default_lease_ttl": "168h",
"max_lease_ttl": "8760h",
"disable_mlock": "true",
"api_addr": "https://FQDN:12345",
"cluster_addr": "https://FQDN.:12345"
}
EOF
)
docker run --name=terraform-vault -v /var/lib/vault:/vault -p "12345:9200" --cap-add=IPC_LOCK -e "VAULT_DEV_LISTEN_ADDRESS=0.0.0.0:9200" -e "VAULT_TOKEN=${VAULT_TOKEN}" -e "VAULT_LOCAL_CONFIG=${VAULT_LOCAL_CONFIG}" vault server
In some document regarding Consul, I saw that you can recover in a similar (?) scenario by creating raft/peers.json
, which I also tried (just adding 1 entry for vault server with localhost), but it did not change anything.
[
{
"id": "vault_tf_raft_node_1",
"address": "127.0.0.1:8300",
"non_voter": false
}
]
Any suggestion how to recover from this or how to correctly set up raft initially? E.g. is the FQDN part in cluster_addr
correct for my use case (I think I also tried unsuccessfully with localhost before)?
I could also try to increase log level and see why this hangs?
Thanks!
Claus