How to recover from "error removing server with duplicate ID"?

I’m using Consul and Nomad, running a single node cluster for both Consul and Nomad 1.3.2, and having issues with two separate Nomad servers instances appearing in Consul even though I’m only running one server

In blue is the actual IP of the server while red is an IP I don’t recognize. Also in the Nomad logs I’m seeing

 nomad: failed to reconcile: error="error removing server with duplicate ID \"f2d46f9f-ddc4-285a-a0d6-b3f77115c037\": need at least one voter in configuration: {[]}"

while checking raft returns

$ nomad operator raft list-peers
Node       ID                                    Address         State     Voter  RaftProtocol
(unknown)  f2d46f9f-ddc4-285a-a0d6-b3f77115c037  127.0.0.1:8300  follower  true   unknown

How do I recover from this and remove the duplicate/unknown server instance? Other Nomad client nodes are also having trouble joining the Nomad cluster because I believe it’s trying to join the unknown server. Thanks for your help.

I was able to partially solve this by generating a new raft.peers file with

NOMAD_DATA_DIR=$(nomad agent-info -json | jq -r '.config.DataDir')
NOMAD_ADDR=$(nomad agent-info -json | jq -r '.stats.nomad.leader_addr')
NODE_ID=$(cat "$NOMAD_DATA_DIR/server/node-id")

cat <<EOF > "$NOMAD_DATA_DIR/server/raft/peers.json"
[
  {
    "id": "$NODE_ID",
    "address": "$NOMAD_ADDR",
    "non_voter": false
  }
]
EOF

and restarting the server. Now the client can join just fine and nomad operator raft list-peers returns appropriate response. However, I’m still seeing two Nomad server entries within Consul. How can I debug why it’s showing 2 when I only have 1 server?

dd25e4a9aabbb2b06be218f6da8e9e3b016b0f8e_2_690x342

Bumping this since I’m still having this issue, thanks

+1

Dec 30 21:58:27 it nomad[278377]:     2022-12-30T21:58:27.365Z [ERROR] nomad: failed to reconcile: error="error removing server with duplicate ID \"f1f2f1f0-e9bf-017b-8f99-a23f68af95b6\": need at least one voter in configuration: {[]}"
Dec 30 21:58:27 it nomad[278377]:     2022-12-30T21:58:27.365Z [ERROR] nomad: failed to reconcile member: member="{it.global 192.168.86.100 4648 map[bootstrap:1 build:1.4.3 dc:dc1 expect:1 id:f1f2f1f0-e9bf-017b-8f99-a23f68af95b6 port:4647 >
Dec 30 21:58:26 it nomad[278377]:     2022-12-30T21:58:26.207Z [ERROR] nomad.autopilot: Failed to reconcile current state with the desired state
Dec 30 21:58:16 it nomad[278377]:

+1

(Same kind of error message)

This seems to be very similar (or the same) as the issue in this thread

Did anyone find a solution in the meantime (1 year)?

EDIT:
what seems to have worked for now is (on the nomad server node):

systemctl stop nomad
rm /var/lib/nomad/server/{raft,serf}`
systemctl restart nomad