Vault single instance using raft storage?

hashicorpuser · March 25, 2022, 12:49pm

Hey all.

Sounds nonsensical, maybe, but:

I want to create proper backups. So I turned away from file storage and am trying raft (for the snapshot feature). The migration etc worked fine, but now Vault hangs when I complete vault operator unseal , with the last entry in the log stating “[WARN] storage.raft: not part of stable configuration, aborting election”.

I guess I misconfigured raft - is there any rtfm out there what is a valid configuration in this use case (no HA, but single instance)? Any other suggestion?

Thanks!

jeffsanicola · March 25, 2022, 5:47pm

I believe you need an actual cluster (3+ nodes) for raft to work correctly.

This probably isn’t what you’re after but some useful info is out on HashiCorp’s raft repo.

The raft architecture tutorial may also be useful.

maxb · March 27, 2022, 10:01pm

You can absolutely run a single node raft cluster. Of course, you don’t get the resiliency benefits of multiple nodes that way, but it does run.

means the server doesn’t think it’s listed as a voting member in the Raft state, and so it can’t attempt to elect itself leader.

Which suggests something has gone wrong with the initial setup of the Raft state for this cluster.

hashicorpuser · March 28, 2022, 10:34am

Hi @maxb,

thanks for the confirmation! Was already about to give up.

I followed vaultproject.io for the migration from file storage, and it seemed to work fine (“Success! All of the keys have been migrated.”). But now, when I start vault server and try to unseal, when I finish the last unseal step (3/3 in my setup), the vault operator unseal command just hangs.

==> Vault server configuration:

             Api Address: https://FQDN:12345
                     Cgo: disabled
         Cluster Address: https://FQDN.:12345
              Go Version: go1.16.5
              Listener 1: tcp (addr: "0.0.0.0:9200", cluster address: "0.0.0.0:9201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
               Log Level: info
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.8.0
             Version Sha: 82a99f14eb6133f99a975e653d4dac21c17505c7

==> Vault server started! Log data will stream in below:

2022-03-25T10:48:58.535Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
2022-03-25T10:51:25.661Z [INFO]  core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:9201
2022-03-25T10:51:25.661Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:9201
2022-03-25T10:51:25.710Z [INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:"node_1", NotifyCh:(chan<- bool)(0xc000f4eaf0), LogOutput:io.Writer(nil), LogLevel:"DEBUG", Logger:(*hclog.interceptLogger)(0xc000a3a090), NoSnapshotRestoreOnStart:true, skipStartup:false}"
2022-03-25T10:51:25.913Z [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault_tf_raft_node_1 Address:FQDN:12345}]"
2022-03-25T10:51:25.913Z [INFO]  storage.raft: entering follower state: follower="Node at FQDN.:12345 [Follower]" leader=
2022-03-25T10:51:31.398Z [WARN]  storage.raft: not part of stable configuration, aborting election
^C==> Vault shutdown triggered

$ vault operator unseal
  Unseal Key (will be hidden):
  Error unsealing: context deadline exceeded
$

My current start script:

VAULT_LOCAL_CONFIG=$(cat <<-EOF
{
	"ui": true,
	"cluster_name": "Some great Name",
	"listener": {
	  "tcp": {
	    "address": "0.0.0.0:9200",
	    "tls_disable": 0,
	    "tls_min_version": "tls11",
	    "tls_disable_client_certs": "true",
	    "tls_require_and_verify_client_cert": "false",
	    "tls_cert_file": "/vault/certs/vault.crt",
	    "tls_key_file": "/vault/certs/vault.key"
	  }
	},
	"storage": {
	  "raft": {
	    "path": "/vault/raft",
	    "node_id": "node_1"
	  }
	},
	"default_lease_ttl": "168h",
	"max_lease_ttl": "8760h",
	"disable_mlock": "true",
	"api_addr": "https://FQDN:12345",
	"cluster_addr": "https://FQDN.:12345"
}
EOF
)
docker run --name=terraform-vault -v /var/lib/vault:/vault -p "12345:9200" --cap-add=IPC_LOCK -e "VAULT_DEV_LISTEN_ADDRESS=0.0.0.0:9200" -e "VAULT_TOKEN=${VAULT_TOKEN}" -e "VAULT_LOCAL_CONFIG=${VAULT_LOCAL_CONFIG}" vault server

In some document regarding Consul, I saw that you can recover in a similar (?) scenario by creating raft/peers.json, which I also tried (just adding 1 entry for vault server with localhost), but it did not change anything.

[
  {
    "id": "vault_tf_raft_node_1",
    "address": "127.0.0.1:8300",
    "non_voter": false
  }
]

Any suggestion how to recover from this or how to correctly set up raft initially? E.g. is the FQDN part in cluster_addr correct for my use case (I think I also tried unsuccessfully with localhost before)?

I could also try to increase log level and see why this hangs?

Thanks!

Claus

hashicorpuser · March 28, 2022, 11:09am

Writing this and assembling all facts, I noticed the error in my ways … I updated my start script and the peers.json as suggested by (support.hashicorp.com)[https://support.hashicorp.com/hc/en-us/articles/360050756393-How-to-recover-from-permanently-lost-quorum-while-using-Raft-integrated-storage-with-Vault-] and it seems to work fine!

VAULT_LOCAL_CONFIG=$(cat <<-EOF
[...]
  "storage": {
    "raft": {
      "path": "/vault/raft",
      "node_id": "vault_tf_raft_node_1"
    }
  },
[...]
  "api_addr": "https://FQDN:12345",
  "cluster_addr": "http://127.0.0.1:9201"
}
EOF
)

[
  {
    "id": "vault_tf_raft_node_1",
    "address": "https://127.0.0.1:9201",
    "non_voter": false
  }
]

==> Vault server configuration:

             Api Address: https://FQDN:12345
                     Cgo: disabled
         Cluster Address: https://127.0.0.1:9201
              Go Version: go1.16.5
              Listener 1: tcp (addr: "0.0.0.0:9200", cluster address: "0.0.0.0:9201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
               Log Level: info
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.8.0
             Version Sha: 82a99f14eb6133f99a975e653d4dac21c17505c7

==> Vault server started! Log data will stream in below:

2022-03-28T10:47:14.279Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
2022-03-28T10:47:48.827Z [INFO]  core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:9201
2022-03-28T10:47:48.827Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:9201
2022-03-28T10:47:48.875Z [INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:"vault_tf_raft_node_1", NotifyCh:(chan<- bool)(0xc000eefab0), LogOutput:io.Writer(nil), LogLevel:"DEBUG", Logger:(*hclog.interceptLogger)(0xc000d254d0), NoSnapshotRestoreOnStart:true, skipStartup:false}"
2022-03-28T10:47:48.877Z [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault_tf_raft_node_1 Address:FQDN:12345}]"
2022-03-28T10:47:48.877Z [INFO]  storage.raft: entering follower state: follower="Node at 127.0.0.1:9201 [Follower]" leader=
2022-03-28T10:47:56.397Z [WARN]  storage.raft: heartbeat timeout reached, starting election: last-leader=
2022-03-28T10:47:56.397Z [INFO]  storage.raft: entering candidate state: node="Node at 127.0.0.1:9201 [Candidate]" term=3
2022-03-28T10:47:56.400Z [INFO]  storage.raft: election won: tally=1
2022-03-28T10:47:56.400Z [INFO]  storage.raft: entering leader state: leader="Node at 127.0.0.1:9201 [Leader]"
2022-03-28T10:47:56.400Z [INFO]  core: writing raft TLS keyring to storage
2022-03-28T10:47:56.404Z [INFO]  core: vault is unsealed
2022-03-28T10:47:56.404Z [INFO]  core: entering standby mode
2022-03-28T10:47:56.411Z [INFO]  core: acquired lock, enabling active operation
2022-03-28T10:47:56.460Z [INFO]  core: post-unseal setup starting
2022-03-28T10:47:56.463Z [INFO]  core: loaded wrapping token key
2022-03-28T10:47:56.463Z [INFO]  core: successfully setup plugin catalog: plugin-directory=""
2022-03-28T10:47:56.464Z [INFO]  core: successfully mounted backend: type=system path=sys/
2022-03-28T10:47:56.464Z [INFO]  core: successfully mounted backend: type=identity path=identity/
2022-03-28T10:47:56.464Z [INFO]  core: successfully mounted backend: type=kv path=kv/
2022-03-28T10:47:56.464Z [INFO]  core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2022-03-28T10:47:56.465Z [INFO]  core: successfully enabled credential backend: type=token path=token/
2022-03-28T10:47:56.466Z [INFO]  core: successfully enabled credential backend: type=userpass path=userpass/
2022-03-28T10:47:56.466Z [INFO]  core: restoring leases
2022-03-28T10:47:56.466Z [INFO]  rollback: starting rollback manager
2022-03-28T10:47:56.468Z [INFO]  identity: entities restored
2022-03-28T10:47:56.468Z [INFO]  identity: groups restored
2022-03-28T10:47:56.468Z [INFO]  core: starting raft active node
2022-03-28T10:47:56.468Z [INFO]  storage.raft: starting autopilot: config="&{false 0 10s 24h0m0s 1000 0 10s}" reconcile_interval=0s
2022-03-28T10:47:56.470Z [INFO]  expiration: lease restore complete
2022-03-28T10:47:56.470Z [INFO]  core: usage gauge collection is disabled
2022-03-28T10:47:56.472Z [INFO]  core: post-unseal setup complete

Thanks,

Chris

kulak91 · July 9, 2023, 9:07pm

Man, thank you! Was trying to solve this for a few hours straight.
P.s. does anybody know if it’s safe to delete unnecessary spawned clones of raft storage and db? (i tried to copy few times while trying to figure out the issue) rm -rf raft_1 raft_2 ?

Topic		Replies	Views
[SOLVED] Help: migrate from standalone file backend to HA raft Vault vault	2	1190	July 20, 2023
Vault raft storage question Vault vault	4	781	August 28, 2023
Vault HA Cluster raft with keepalived Vault	3	1235	December 3, 2022
Vault Raft Stuck in Standby Mode Vault	2	1828	June 12, 2020
Recover vault cluster with raft storage Vault raft	4	5294	January 29, 2021

Vault single instance using raft storage?

Related topics