Vault with raft-storage using AWS Auto-Scaling-Group and Auto-rejoin

Hi,

I’m trying to use ASG to maintain a 3 or 5 node vault cluster using raft as backend storage. ASG is useful in the sense that it will auto spin-up a new instance if one of them died for some reason. I’m also fronting the cluster with an ELB that will direct to the leader using the /sys/health check.

I’ve setup my node to auto-rejoin using the retry_rejoin option:

storage "raft" {
      path    = "${vault_storage_path}"
      node_id = "$${HOSTNAME}"
      retry_join {
        leader_api_addr = "https://${vault_elb_addr}:8200"
      }
    }

Things work really well when the nodes are rebooted, one of the standby nodes will be elected as leader and the rebooted node will start up again as standby. The ELB is also able to redirect traffic to the new leader, all good.

However, things didn’t behave as expected if one of the node is terminated, and a brand new node tries to join the cluster. This is the sequence of event I saw:

  1. 3 nodes A, B, C, A is the leader
  2. A got terminated, B is now the leader, C remains as standby
  3. ELB now points to the leader node B
  4. Node D is spun up, tries to join the cluster to node B via the ELB
  5. Node B decides to step down (don’t know why), Node B and C becomes follower
  6. ELB has no healthy node to point to, loose re-direction to Node B
  7. Node D looses connection and can’t join

However, if I am able to remove Node A from the peer list before Node D attempts to join the cluster (step 3a), everything works like a charm.

Anyone knows why would Node B step down? It looks like raft is tracking the node using node ID or IP. There isn’t a straight forward way to make the new node D to have the same IP as the node that has died. Is there a way to make a brand new node join successfully, and say a timeout for bad nodes to be removed from the cluster automatically?

Thanks!

Hi Boon,

Thank you for posting this question.

As Vault 1.4 is the first GA release with Raft Integrated Storage, the functionality necessary to natively support running in an ASG has not yet been implemented. We are aware of it but I don’t yet have any details on when it would be implemented.

In the mean time, if you require an ASG there are a couple avenues you could consider. One is using Consul as a storage backend, see the Consul Cloud Auto-join documentation.

Another possibility would be using a script to populate the Vault config file at instance launch with discovered node addresses.

Sincerely,
Andy

1 Like

Hi @assareh @Boon the following is just a thought, (I use this trick elsewhere) so not sure if it would work for Vault …

How about if there are N (3 or 5) separate ASGs, each ASG with a count of 1 (one) for min, max, desired.

The user_data can have an awscli based code to “describe-intances” based on the “vault server” tag and create the config file, before the systemd based Vault service is started.

Thoughts?

EDIT: The same idea above could be used with a single ASG as well, right? Instead of joining via ELB, a small “discover by tags” script could generate config for each node, right?

1 Like

Hi shantanugadgil, thanks for posting! Yes, this could be doable. However the challenge you may have is cleaning up dead nodes from the raft cluster. One approach I’ve seen is using a lambda function to automatically cleans up dead nodes before new nodes come up and join the cluster.

Sincerely,
Andy