Similar issued encountered on 1.5.3 where moving a vault backed by S3 and auto unseal KMS is failing to become active node with HA storage as raft. Cluster automatically goes into standby mode.
and for that you will need the root token. I say root token as in your setup HA will be busted keeping you from login when you run it with the new ha_storage stanza.
There are some more challenges that u will face eventually, like node_id value if changed, will need a new bootstrap again. I am not sure why, but if you, for example bootstrapped 1 instance cluster with node_id=theone and then later changed config to have node_id=thesecond, then the HA will be busted again. Note: I have similar setup as you. Auto unseal KMS + S3 storage backend running OSS 1.5.3.
you will need to re-bootstrap and in order to that you will need to remove the raft TLS key from the storage backend. I just made a backup first to ensure that I don’t break things.
Even if you will to not change node_id, eventually you will run into an issue where taking down standby will bust HA on master or vice versa. This is the biggest issue right now. Even when I take down active master, standby never becomes the new active. If I take down standby, then master HA is busted. It is as if that the entire quorum is needed to keep rafting. take one down, and whole thing collapses, which seems bit opposite to HA design. I believe it could be bug or just lack of knowledge on our part of how this is supposed to work.
I really would like some Hashicorp person to comment on this behavior. So far this experiment has made me think that raft as only HA is probably not a good option right now on vault 1.5.3. Not sure about other versions.
I spent a lot of time with this yesterday. the problem is that raft when used only as HA storage doesn’t bootstrap raft in vault.
This is correct. If you’re using raft as the ha_backend mechanism, the cluster needs to be bootstrapped manually. You will need to call sys/storage/raft/bootstrap on one of the nodes initiate the bootstapping process and sys/storage/raft/join on the rest of the nodes once that’s done (no leader address needs to be provided since vault can get this from the shared storage backend). Both of these operations needs to be done once the nodes are unsealed.
Thanks for the info. Do you know why stopping vault process for any raft associated peer will make the cluster HA unhealthy? During my testing, I found that HA was busted if I stopped active or standby peer. Is that the idea? starting stopped peer back, resulted in HA being alive. Is this how it usually works? Like with other HA backends, losing any instances still kept HA alive. I tested with 2 node cluster here.
also, with node_id="node_${ip_address}" setup, bootstrap was needed again, which was bit weird as well. In a cloud deployment model, getting unique node id would be necessary and not necessarily same all the time. Much appreciate your thoughts on above two questions as well.
I was able to see HA active, but it wasn’t truly HA as per me with raft.
The quorum required by raft to establish leadership is determined by (N/2)+1, where N is the total number of nodes in the cluster. This is in essence the minimum number of nodes required for the Vault to be operational given N nodes in your cluster. You would need to have at least 3 nodes so that failure of 1 node is still within the quorum requirement.
As far as node_id goes, these are intended to be unique identifiers to the node. If the nodes are ephemeral in nature, you would not have to bootstrap again whenever they are spun up, but you would have to join them to the existing cluster, and probably remove any dangling nodes via remove-peer whenever they are torn down to keep the number of nodes within a cluster from growing over time.
@calvn so I may be way off here, but seems to me if I destroy an existing raft cluster, and spin up new one, I need to re-bootstrap all the time? Also the new bootstrap fails
Error writing data to sys/storage/raft/bootstrap: Error making API request.
URL: PUT https://127.0.0.1:8200/v1/sys/storage/raft/bootstrap
Code: 500. Errors:
* could not generate TLS keyring during bootstrap: TLS keyring already present
I need to manually remove the tls file from core/raft/ for bootstrap to work again. Is the case for already boostrapped cluster and new complete rollout covered?
You might be re-using the storage backend since TLS information is stored on storage and not ha_storage. You will have to do a rolling update to prevent all nodes from the cluster being destroyed at the same time (or wipe out the storage backend’s data and start from scratch if that’s desired). If all cluster nodes are destroyed before a new one joins, then there’s not state to indicate that a cluster has already been bootstrapped.
Yup, I guess u can’t get around this one. But makes sense. I was hoping somehow it allowed you re-bootstrap if you happen to kill the previous quorum entirely without fiddling with the storage backend. Anyway, I did come to the same conclusion as above. Usually if I go with Dynamo or something else, I need not worry about previous HA state. Thanks!