Stuck Creating a HA Cluster - "local node not active, active cluster node not found"

We are attempting to roll out Vault in our production environment, but in our dev phase we are running into trouble getting a cluster up and running. Currently we have one node running and it is going fine, but we need to account for redundancy with a 3-node HA cluster, using raft storage.

After the primary vault was stood up, I created two matching VMs and adjusted the config file for each to use the retry_join stanza. When I reboot them all and try to unseal the primary vault I get the following error:

“Error looking up token: Error making API request. URL: GET {vault address}/v1/auth/token/lookup-self Code: 500. Errors: * local node not active but active cluster node not found” .

Here is an example of the config, obvious parts redacted.

ui = true
disable_mlock = false

api_addr = "https://{primary vault dns name}:8200"
cluster_addr = "https://{primary vault dns name}:8201"

storage "raft" {
  path    = "/vault/data"
  node_id = "primary_node"
  retry_join {
    leader_api_addr = "https://{secondary1 dns name}:8200"
    leader_ca_cert_file = "/etc/vault/CA.crt"
    leader_client_cert_file = "/vault/cert.crt"
    leader_client_key_file = "/vault/key.key"
  }
  retry_join {
    leader_api_addr = "https://{secondary2 dns name}:8200"
    leader_ca_cert_file = "/vault/CA.crt"
    leader_client_cert_file = "/vault/cert.crt"
    leader_client_key_file = "/vault/key.key"
  }
}

# HTTPS listener
listener "tcp" {
  address     = "0.0.0.0:8200"
  tls_disable = "false"
  tls_cert_file = "/vault/cert.crt"
  tls_key_file = "/vault/key.key"
  telemetry {
    unauthenticated_metrics_access = "true"
  }
}

telemetry {
  prometheus_retention_time = "30s"
  disable_hostname = true
}

I have also tried running the vault operator raft join https://{primary dns name}:8200 command which returns joined true but does not actually join the cluster.

Further, I even tried joining the secondaries from their respective GUIs. This appeared to work for one of the two secondary nodes, since it would show up on the primary when running vault operator raft list-peers, but the step down command yielded no results.

I’m at a loss for what to try next so any/all responses would be greatly appreciated.

It is difficult to understand the state of this cluster from the limited information shown here.

I will make various possibly useful comments, but I can’t tell what’s wrong from this information.


1)

In the configuration file you mention:

If this is the same on all three nodes, that would be a problem as you’ve effectively told every node to believe its hostname is {primary vault dns name} - that would certainly break some things.


2)

In general I strongly recommend against overriding the Raft node ID manually, as it creates opportunities to accidentally give multiple nodes the same ID. You should leave it unset, and Vault will auto-generate one correctly in future. HOWEVER you can’t change it in an existing cluster, so don’t change this now or it’ll just make things worse. Do get rid of it whenever you deploy a new node or cluster.


3)

This configuration appears to be incorrect/redundant, since I don’t see anything in the configuration file about requiring client certificate authentication.


4)

A note about joining nodes to a Raft cluster: once the join has been initiated, either via retry_join or vault operator raft join, you still need to unseal the node that is joining to complete the join operation. When the unseal keys are provided, Vault will use them as authentication to “log in” to the cluster and complete the join.


5)

You have only listed the secondary nodes in the retry_join blocks. I can’t tell whether that’s because you are implying you have different configs for each node, each of them specifying the other two nodes from that nodes perspective, or not.

In any case, it’s fine to list all three nodes in the retry_join blocks, including the local node, which can make managing the configuration file easier.


6)

You are referring to your nodes as primary, secondary1, secondary2. This is inaccurate - a Vault cluster is perfectly symmetrical - no node is permanently designated as primary. One node will become active, but this is handled through automatic election, and can change, so naming the nodes in a way which statically emphasises one of them doesn’t make sense.


In order to better understand the state of your cluster, it would be helpful if you could post, for each of the three nodes, the following:

  • The full Vault configuration file
  • Output of:
    • vault status
    • vault operator members
    • vault operator raft list-peers

Sorry I should have clarified, the config I shared was just from what I have been calling the primary, since that was the first vault we built. As stated in my original post, we are trying to expand to a 3-node HA cluster. Let me go through and answer your questions.

  1. The secondary nodes have their own dns name for api_addr but the primary dns name for cluster_addr

  1. I was not aware of either of these points. So the node_id is both generated and should never be altered once the vault is initiated?

  1. Will remove these and try again

  1. Correct, however we get the same error when trying to unseal. Further, these should be the same keys from initializing the primary, correct?

  1. No this is because I only provided the primary’s config. So secondary1 has primary and secondary2 in retry_join blocks; secondary2 has primary and secondary1.

  1. Correct, this is only done for the purpose of the post.

I have already provided the full config file from the first node. The other two have essentially the same config with slightly altered info – their own dns name for api_addr and the adjusted retry_join blocks.

I already restored from a backup since the vault was effectively bricked from trying to join the other two nodes, vault operator members and vault operator raft list-peers don’t really give much more information:

# vault status
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            5
Threshold               3
Version                 1.13.1
Build Date              2023-03-23T12:51:35Z
Storage Type            raft
Cluster Name            vault-cluster-xxxxxx
Cluster ID              xxxxxxxx
HA Enabled              true
HA Cluster              https://{primary IP addr}:8201
HA Mode                 active
Active Since            2023-04-27T21:37:55.541922926Z
Raft Committed Index    8339
Raft Applied Index      8339
# vault operator members
Host Name      API Address                 Cluster Address             Active Node    Version    Upgrade Version    Redundancy Zone    Last Echo
---------      -----------                 ---------------             -----------    -------    ---------------    ---------------    ---------
primary-vault    https://{primary IP addr}:8200    https://{primary IP addr}:8201    true           1.13.1     1.13.1             n/a                n/a
# vault operator raft list-peers
Node     Address             State     Voter
----     -------             -----     -----
node1    {primary IP addr}:8201    leader    true

Ah, that’ll break things, for sure. The meaning of cluster_addr is “where other nodes should connect to this one, to communicate between nodes of the cluster”. So you have your ‘secondary’ nodes informing the ‘primary’ node that it should talk to itself, when it tries to talk to the other nodes.

The Raft node ID is the main way members of the cluster are identified, at the level of the cluster consensus algorithm. It’s possible to configure a specific node ID in the configuration file, but I have no idea why so many docs give this as an example. If you leave it unset, Vault will generate a random UUID, and store it on disk in the data directory. This is ideal - you don’t have to worry about setting them correctly, it just works, and you can view the IDs in vault operator raft list-peers if you ever need to (basically the only time you ever would, is if you needed to explicitly use vault operator raft remove-peer to tell the cluster to forget about a node which has been irrecoverably lost).

The only use-case for overriding the node ID is config, is if for some reason you want the node to carry on pretending to the be the same node to the rest of the cluster, even if its local storage is completely wiped and it was rejoining the cluster with no data - it’s not clear to me why this would ever be a useful thing to do.

Right, unseal keys are a cluster-level thing, not node-level.

Awesome, this is a great place to start thanks!

A couple more questions: would it be possible to start over with a new config and restore the vault from a snapshot, or will that break still? And basically the IP/dns address in api_addr and cluster_addr should essentially match just different ports, correct?

You can set up a new cluster, get all the nodes joined, and then restore a Raft snapshot. When you do, what Vault does, is to load in the snapshot, but ignore the cluster membership information from the snapshot and carry on using the new cluster’s membership information.

But, after loading the snapshot, the new cluster’s unseal keys have been discarded and you are now using the unseal keys related to the snapshot, and have to re-unseal the nodes.

So yes, assembling a new cluster and restoring the snapshot is OK, just keep close track of which unseal keys are in use when and where.

Approximately, mostly.

api_addr can be http:// or https:// depending on how you have configured Vault. cluster_addr is always https://, no exceptions.

api_addr should usually point to the individual nodes, but there are cases with Vault behind a loadbalancer where people choose to point it at the loadbalancer address. cluster_addr must always point to the individual nodes, no exceptions.

api_addr is a complicated topic - I even opened a GitHub issue to complain that the one setting is used to do too many things: `api_addr` is used for multiple purposes with sometimes differing requirements · Issue #15070 · hashicorp/vault · GitHub

We’re planning on doing this as well, but that’s a different headache for a different time.

It’s worth noting that even if you have Vault behind a loadbalancer, the only time Vault will actually try to HTTP redirect a client to the api_addr is when a request

  • reaches a standby node
  • that believes it knows about an active node
  • yet it still fails somehow to forward the request to the active node internally

This is really rare.

It is frustrating that HashiCorp don’t provide a way to turn off the redirect entirely, as it’s basically never useful in modern Vault.

I made the changes discussed to each nodes config file, and unsealed the vault. When I run vault operator raft join https://{vault dns name}:8200 I am greeted with the following error:

Error joining the node to the Raft cluster: Error making API request.

URL: POST https://{secondary vault dns}:8200/v1/sys/storage/raft/join
Code: 500. Errors:

* failed to join raft cluster: failed to get raft challenge

The secondary node has not been initialized, I can’t remember where I saw it but I read that any other vaults should not be initialized if they are to be joined to a cluster. Any thoughts on how I should proceed?

Quick update, I noticed I had commented the node_id on each. Removing that got the vault operator raft join command working, and the two additional nods appear when I run vault operator raft list-peers.

The issue I’m facing now is unsealing the additional nodes. When running vault operator unseal and providing the keys from the original vault, I get this error:

Error unsealing: Error making API request.

URL: PUT https://{secondary vault dns}:8200/v1/sys/unseal
Code: 500. Errors:

* error bootstrapping cluster: cluster already has state

Also, when I list-peers, the two additional nodes are listed as non-voters.

You have been tripped up by results of your previous experiments.

You need to:

  • Shut down your additional nodes
  • Delete all the files from the additional nodes’ data directories
  • Use vault operator raft remove-peer <NODE-ID> against the original node, to remove the additional nodes from the Raft peer set
  • If you like, now is a good time to permanently remove the node_id configuration from your additional nodes’ configuration files, whilst they are shut down, not in a cluster, and have no data. (But leave the original node configuration alone.)
  • Start the additional nodes back up again, and let them join. You do not need to run vault operator raft join, since you have configured retry_join in the configuration file. You can run vault status against the additional nodes to check whether they’ve managed to talk to the original node yet - when they first start up, they won’t know the total shares and threshold of the unseal keys, and these will show as zero. Once they’ve started to join, vault status will show the actual non-zero data from your cluster.
  • Now you can unseal the additional nodes, allowing them to finalize their join.

@maxb , it was a journey but we got there. This is now up and running, thank you so much!!! All three nodes are reporting the others as peers and properly pass leadership when one steps down.