I’m trying to setup a 3 node cluster with Vault version 1.11.4 and running into some issues. I am using AWS to auto-unseal which works fine on the primary node, and works on the secondary nodes if I have them set to file. When I try to have them retry_join the primary node, the primary node shows them as raft members, but voting is false.
Additionally the Status output of the secondary nodes is as follows:
Recovery Seal Type awskms
Total Recovery Shares 0
Unseal Progress 0/0
Unseal Nonce n/a
Build Date 2022-09-23T06:01:14Z
Storage Type raft
HA Enabled true
additional information, I get these two log errors:
Oct 7 14:16:21 vault-3 vault: 2022-10-07T14:16:21.104-0600 [ERROR] core: failed to retry join raft cluster: retry=2s err="failed to send answer to raft leader node: error bootstrapping cluster: cluster already has state"
Oct 7 14:16:22 vault-3 vault: 2022-10-07T14:16:22.192-0600 [INFO] core: stored unseal keys supported, attempting fetch
Oct 7 14:16:22 vault-3 vault: 2022-10-07T14:16:22.192-0600 [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
Any ideas what’s going on here? Why would the auto unseal work fine with a file storage type, then suddenly break when attempting to join a raft?
This suggests an important misunderstanding - the file storage backend does not support HA, meaning there’s no such thing as a secondary node, when using file storage.
If you had them “working”, then what you actually had was 3 entirely separate single-node Vault clusters.
This error is saying “You told me to join a Raft cluster, but I already locally have Raft state”. Only completely blank Vault nodes with no data of their own can join an existing Raft cluster.
Are you, possibly, initialising more than 1 node in your intended cluster? Do not do this, initialising is something you do once per cluster, not per node.
Alternatively, do you possibly have old Raft state left in your nodes data directories from a previous attempt? If so, retry your setup after you’ve removed all left-over data directories from previous experimentation.
I am not using the file state, I just provided that as an example that the auto-unseal configuration does work correctly, so I’m relatively certain my issue is with the raft configuration.
That is what I thought, so two things I tried to combat this were deleting everything under the raft data directory on my server and rejoining. I also tried building a brand new server and joining it, but in both cases they are exhibiting the same behavior.
This has been solved, was looking at it with a colleague and realized that the old raft folders from our experimentation were still on the servers. Even though the files were in a different location, and we had changed the directory for production, something was apparently still referencing the old directory. After clearing the old raft directory and vault.db file, it joined as expected.