Containerize Vault and Consul

Hello Team,

We are having issue in Prod Setup with Vault and Consul. Here is the Setup.

Current Consul and Vault are running as System Process on Linux machines with Consul 1.4.2 ver.
we need to upgrade to the latest 1.11.x and here is what we did.

Consul & Vault running on servers 1,2,3
we chose server 3 to start with and Installed Docker and deployed 1.11.x.
Containers are up and it started failing to connect to the WAN port 8302 on the other servers.

we thought to restart the other 2 nodes as well and once we did (here is where the issue started) , Server 3 was elected as leader and looks like it corrupted the files on the other 2 servers as well.

So we reverted back the server Snapshot and from then. Now all the servers have Process running. But they are unable to Elect a leader and the process keeps restarting.

It is a complete outage now.

raft protocol 2 is being used.

can someone please help with the issue

I am sorry to hear you’re experiencing an outage. Welcome to the community help forum. Do bear in mind that it is a community help forum, not a support team - if you have paid support from HashiCorp you should approach the official support team now.

Unfortunately, I need to start by pointing out a couple of awkward truths:

Your post is lacking in a lot of detail - so much so, I debated whether the chance of being able to successfully help existed. Some of the details which are lacking, are:

Consul and Vault are two products with entirely separate version numbers, yet you have only mentioned one “from” version and one “to” version.

The mention of Consul and Vault together hints that you might be talking about Consul being used as backend storage for Vault, but you don’t confirm this, and the mention of Consul WAN support then hints that it might be a multi-datacenter cluster instead, in contradiction of that.

Nowhere do you post any error or log messages, the text of the server config files, or commands showing information on the status of the cluster.

Besides the lack of detail, you’ve also chosen to combine a major upgrade, with switching your deployment architecture to be based on Docker at the same time, further increasing the amount of change.

OK, so what now?

Well, you did include one very important detail:

Since Consul 1.9 no longer supports Raft protocol 2, this is quite probably the original trigger for the problems.

Where to go from here? Difficult to advise with the current level of detail given. You mentioned some kind of snapshot - what exactly is that? Maybe restoring from it, or another backup, is the easiest way to explain remotely? Alternatively there may be some way to coax the existing cluster back to a working state, but we will need lots more detail to assess that.

Please start by providing as many answers as you can to the things I called out as missing detail.

Additionally, it may be very helpful to see the output of consul operator raft list-peers -stale from each of the three Consul server nodes.

hi,
thank you for the prompt response. Really appreciate it.
so let me share few things

Here are the Versions of Consul and Vault. And yes, vault uses consul as backend.
Consul Server 1,2,3 - uses 1.4.2
Vault Server 1,2,3 - uses v1.0.3

Upgrade Path we are planning -
Consul 1.4.2 → 1.12.3
Vault 1.0.3 → 1.11.1

Also it is not a Multi DC, its in the Same LAN, so ideally we dont need this port. Having said this, when we remove the WAN port after deploying as container on Server 3, it keeps trying to connect to WAN Port on Server1 and 2 (Not sure why) and throws the error CONNECTION REFUSED.
To mitigate this, yes we can Open a FW rule (so this is not the main concern now)

Yes, we had to move to Containerization as we did not find any Binary for the latest Vault Version and hence we had to plan such a change to containerize . (Do you know if we have any binary version for vault and consul)

Now that you mention that RAFT 2 is not supported from 1.9, i will check from this angle (thanks a lot for this)

The Snapshot i am mentioning is the VMware snapshot of the whole server. So we revert back to 1.4.2. But even with this revert, still the leader election failed.

o/p of this command
consul operator raft list-peers -stale
Error getting peers: Failed to retrieve raft configuration: Unexpected response code: 500 (No cluster leader)

Jan 5 07:06:39 Server2 consul[2248]: 2023/01/05 07:06:39 [ERR] http: Request GET /v1/kv/vault/core/lock?index=14964399&wait=15000ms, error: rpc error making call: EOF
from=127.0.0.1:33578
Jan 5 07:06:39 Server2 consul[2248]: http: Request GET /v1/kv/vault/core/lock?index=14964399&wait=15000ms, error: rpc error making call: EOF from=127.0.0.1:33578
Jan 5 07:06:39 Server2 vault[2252]: 2023-01-05T07:06:39.578Z [ERROR] core: failed to acquire lock: error=“failed to read lock: Unexpected response code: 500”
Jan 5 07:06:39 Server2 consul[2248]: 2023/01/05 07:06:39 [ERR] http: Request PUT /v1/session/destroy/78c5eee7-2528-621c-5077-2ff529f7e82c, error: rpc error getting cl
ient: failed to get conn: dial tcp Server2:0->Server1:8300: connect: connection refused from=127.0.0.1:46054
Jan 5 07:06:39 Server2 consul[2248]: http: Request PUT /v1/session/destroy/78c5eee7-2528-621c-5077-2ff529f7e82c, error: rpc error getting client: failed to get conn: dial tcp Server2:0->Server1:8300: connect: connection refused from=127.0.0.1:46054
Jan 5 07:06:39 Server2 consul[2248]: 2023/01/05 07:06:39 [INFO] serf: EventMemberUpdate: Server1
Jan 5 07:06:39 Server2 consul[2248]: serf: EventMemberUpdate: Server1

Jan 5 07:06:44 Server2 consul[2248]: 2023/01/05 07:06:44 [WARN] raft: Heartbeat timeout from “Server1:8300” reached, starting election
Jan 5 07:06:44 Server2 consul[2248]: 2023/01/05 07:06:44 [INFO] raft: Node at Server2:8300 [Candidate] entering Candidate state in term 954
Jan 5 07:06:44 Server2 consul[2248]: raft: Heartbeat timeout from “Server1:8300” reached, starting election
Jan 5 07:06:44 Server2 consul[2248]: raft: Node at Server2:8300 [Candidate] entering Candidate state in term 954
Jan 5 07:06:44 Server2 consul[2248]: 2023/01/05 07:06:44 [ERR] raft: Failed to make RequestVote RPC to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: EOF
Jan 5 07:06:44 Server2 consul[2248]: raft: Failed to make RequestVote RPC to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: EOF
Jan 5 07:06:47 Server2 consul[2248]: 2023/01/05 07:06:47 [DEBUG] raft-net: Server2:8300 accepted connection from: Server1:54559
Jan 5 07:06:47 Server2 consul[2248]: 2023/01/05 07:06:47 [INFO] raft: Duplicate RequestVote for same term: 954
Jan 5 07:06:47 Server2 consul[2248]: raft: Duplicate RequestVote for same term: 954
Jan 5 07:06:51 Server2 consul[2248]: 2023/01/05 07:06:51 [WARN] raft: Election timeout reached, restarting election
Jan 5 07:06:51 Server2 consul[2248]: 2023/01/05 07:06:51 [INFO] raft: Node at Server2:8300 [Candidate] entering Candidate state in term 955
Jan 5 07:06:51 Server2 consul[2248]: raft: Election timeout reached, restarting election
Jan 5 07:06:51 Server2 consul[2248]: raft: Node at Server2:8300 [Candidate] entering Candidate state in term 955
Jan 5 07:06:51 Server2 consul[2248]: 2023/01/05 07:06:51 [INFO] raft: Election won. Tally: 2
Jan 5 07:06:51 Server2 consul[2248]: 2023/01/05 07:06:51 [INFO] raft: Node at Server2:8300 [Leader] entering Leader state
Jan 5 07:06:51 Server2 consul[2248]: 2023/01/05 07:06:51 [INFO] raft: Added peer 6ec8afbf-6d81-00f7-b92c-5794e46a875b, starting replication
Jan 5 07:06:51 Server2 consul[2248]: raft: Election won. Tally: 2
Jan 5 07:06:51 Server2 consul[2248]: 2023/01/05 07:06:51 [INFO] consul: cluster leadership acquired
Jan 5 07:06:51 Server2 consul[2248]: raft: Node at Server2:8300 [Leader] entering Leader state
Jan 5 07:06:51 Server2 consul[2248]: 2023/01/05 07:06:51 [INFO] consul: New leader elected: Server2
Jan 5 07:06:51 Server2 consul[2248]: raft: Added peer 6ec8afbf-6d81-00f7-b92c-5794e46a875b, starting replication
Jan 5 07:06:51 Server2 consul[2248]: consul: cluster leadership acquired
Jan 5 07:06:51 Server2 consul[2248]: consul: New leader elected: Server2
Jan 5 07:06:52 Server2 consul[2248]: 2023/01/05 07:06:52 [INFO] raft: pipelining replication to peer {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}
Jan 5 07:06:52 Server2 consul[2248]: raft: pipelining replication to peer {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}
Jan 5 07:06:52 Server2 consul[2248]: 2023/01/05 07:06:52 [INFO] raft: Updating configuration with AddNonvoter (101f8d8e-3487-b154-62a5-42f83ec535e3, Server3:8300) to [{Suffrage:Voter ID:3e46405f-1031-1120-fbfb-e2d18710ac64 Address:Server2:8300} {Suffrage:Voter ID:6ec8afbf-6d81-00f7-b92c-5794e46a875b Address:Server1:8300} {Suffrage:Nonvoter ID:101f8d8e-3487-b154-62a5-42f83ec535e3 Address:Server3:8300}]
Jan 5 07:06:52 Server2 consul[2248]: raft: Updating configuration with AddNonvoter (101f8d8e-3487-b154-62a5-42f83ec535e3, Server3:8300) to [{Suffrage:Voter ID:3e46405f-1031-1120-fbfb-e2d18710ac64 Address:Server2:8300} {Suffrage:Voter ID:6ec8afbf-6d81-00f7-b92c-5794e46a875b Address:Server1:8300} {Suffrage:Nonvoter ID:101f8d8e-3487-b154-62a5-42f83ec535e3 Address:Server3:8300}]
Jan 5 07:06:52 Server2 consul[2248]: 2023/01/05 07:06:52 [INFO] raft: Added peer 101f8d8e-3487-b154-62a5-42f83ec535e3, starting replication
Jan 5 07:06:52 Server2 consul[2248]: raft: Added peer 101f8d8e-3487-b154-62a5-42f83ec535e3, starting replication
Jan 5 07:06:52 Server2 consul[2248]: 2023/01/05 07:06:52 [INFO] consul: removing server by ID: “101f8d8e-3487-b154-62a5-42f83ec535e3”
Jan 5 07:07:39 Server2 consul[2248]: 2023/01/05 07:07:39 [INFO] raft: Added peer 101f8d8e-3487-b154-62a5-42f83ec535e3, starting replication
Jan 5 07:07:39 Server2 consul[2248]: raft: Added peer 101f8d8e-3487-b154-62a5-42f83ec535e3, starting replication
Jan 5 07:07:39 Server2 consul[2248]: 2023/01/05 07:07:39 [ERR] raft: Failed to AppendEntries to {Nonvoter 101f8d8e-3487-b154-62a5-42f83ec535e3 Server3:8300}: EOF
Jan 5 07:07:39 Server2 consul[2248]: raft: Failed to AppendEntries to {Nonvoter 101f8d8e-3487-b154-62a5-42f83ec535e3 Server3:8300}: EOF
Jan 5 07:07:39 Server2 consul[2248]: 2023/01/05 07:07:39 [INFO] consul: member ‘Server3’ joined, marking health alive
Jan 5 07:07:39 Server2 consul[2248]: consul: member ‘Server3’ joined, marking health alive
Jan 5 07:07:39 Server2 consul[2248]: 2023/01/05 07:07:39 [WARN] raft: AppendEntries to {Nonvoter 101f8d8e-3487-b154-62a5-42f83ec535e3 Server3:8300} rejected, sending older logs (next: 14974117)
Jan 5 07:07:39 Server2 consul[2248]: raft: AppendEntries to {Nonvoter 101f8d8e-3487-b154-62a5-42f83ec535e3 Server3:8300} rejected, sending older logs (next: 14974117)
Jan 5 07:07:39 Server2 consul[2248]: 2023/01/05 07:07:39 [INFO] raft: pipelining replication to peer {Nonvoter 101f8d8e-3487-b154-62a5-42f83ec535e3 Server3:8300}
Jan 5 07:07:39 Server2 consul[2248]: raft: pipelining replication to peer {Nonvoter 101f8d8e-3487-b154-62a5-42f83ec535e3 Server3:8300}
Jan 5 07:07:40 Server2 consul[2248]: 2023/01/05 07:07:40 [WARN] consul: error getting server health from “Server3”: rpc error making call: stream closed
Jan 5 07:07:40 Server2 consul[2248]: consul: error getting server health from “Server3”: rpc error making call: stream closed
Jan 5 07:07:41 Server2 consul[2248]: 2023/01/05 07:07:41 [WARN] consul: error getting server health from “Server3”: context deadline exceeded
Jan 5 07:07:41 Server2 consul[2248]: consul: error getting server health from “Server3”: context deadline exceeded

Also we saw that the Consul Services kept restarting every 5s or so since it could not find LEADER.

Let me know if you need more details and thanks a lot.

Some more logs

Jan 5 07:15:05 Server2 consul[24185]: ==> Starting Consul agent…
Jan 5 07:15:05 Server2 consul[24185]: raft: Restored from snapshot 951-14971662-1672882915168
Jan 5 07:15:05 Server2 consul[24185]: raft: Initial configuration (index=14974218): [{Suffrage:Voter ID:3e46405f-1031-1120-fbfb-e2d18710ac64 Address:Server2:8300} {Suffrage:Voter ID:6ec8afbf-6d81-00f7-b92c-5794e46a875b Address:Server1:8300} {Suffrage:Nonvoter ID:101f8d8e-3487-b154-62a5-42f83ec535e3 Address:Server3:8300}]
Jan 5 07:15:05 Server2 consul[24185]: raft: Node at Server2:8300 [Follower] entering Follower state (Leader: “”)
Jan 5 07:15:05 Server2 consul[24185]: serf: EventMemberJoin: Server2.first-prod Server2
Jan 5 07:15:05 Server2 consul[24185]: serf: Failed to re-join any previously known node
Jan 5 07:15:05 Server2 consul[24185]: serf: EventMemberJoin: Server2 Server2
Jan 5 07:15:05 Server2 consul[24185]: serf: Attempting re-join to previously known node: f2aebe5ba646: Server3:8301
Jan 5 07:15:05 Server2 consul[24185]: consul: Adding LAN server Server2 (Addr: tcp/Server2:8300) (DC: first-prod)
Jan 5 07:15:05 Server2 consul[24185]: consul: Handled member-join event for server “Server2.first-prod” in area “wan”
Jan 5 07:15:05 Server2 consul[24185]: consul: Raft data found, disabling bootstrap mode
Jan 5 07:15:05 Server2 consul[24185]: agent: Started DNS server 0.0.0.0:8600 (udp)
Jan 5 07:15:05 Server2 consul[24185]: agent: Started DNS server 0.0.0.0:8600 (tcp)
Jan 5 07:15:05 Server2 consul[24185]: agent: Started HTTP server on [::]:8500 (tcp)
Jan 5 07:15:05 Server2 consul[24185]: serf: EventMemberJoin: f2aebe5ba646 Server3
Jan 5 07:15:05 Server2 consul[24185]: ==> Joining cluster…
Jan 5 07:15:05 Server2 consul[24185]: agent: (LAN) joining: [Server1 Server2 Server3]
Jan 5 07:15:05 Server2 consul[24185]: consul: Adding LAN server f2aebe5ba646 (Addr: tcp/Server3:8300) (DC: first-prod)
Jan 5 07:15:05 Server2 consul[24185]: agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s os packet scaleway softlayer triton vsphere
Jan 5 07:15:05 Server2 consul[24185]: agent: Joining LAN cluster…
Jan 5 07:15:05 Server2 consul[24185]: agent: (LAN) joining: [Server1 Server2 Server3]
Jan 5 07:15:05 Server2 consul[24185]: memberlist: Refuting a suspect message (from: Server2)
Jan 5 07:15:05 Server2 consul[24185]: serf: EventMemberJoin: Server1 Server1
Jan 5 07:15:05 Server2 consul[24185]: serf: Re-joined to previously known node: f2aebe5ba646: Server3:8301
Jan 5 07:15:05 Server2 consul[24185]: consul: Adding LAN server Server1 (Addr: tcp/Server1:8300) (DC: first-prod)
Jan 5 07:15:05 Server2 consul[24185]: agent: (LAN) joined: 3 Err:
Jan 5 07:15:05 Server2 consul[24185]: agent: started state syncer
Jan 5 07:15:05 Server2 consul[24185]: Join completed. Synced with 3 initial agents
Jan 5 07:15:05 Server2 consul[24185]: ==> Consul agent running!
Jan 5 07:15:05 Server2 consul[24185]: Version: ‘v1.4.2’
Jan 5 07:15:05 Server2 consul[24185]: Node ID: ‘3e46405f-1031-1120-fbfb-e2d18710ac64’
Jan 5 07:15:05 Server2 consul[24185]: Node name: ‘Server2’
Jan 5 07:15:05 Server2 consul[24185]: Datacenter: ‘first-prod’ (Segment: ‘’)
Jan 5 07:15:05 Server2 consul[24185]: Server: true (Bootstrap: false)
Jan 5 07:15:05 Server2 consul[24185]: Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Jan 5 07:15:05 Server2 consul[24185]: Cluster Addr: Server2 (LAN: 8301, WAN: 8302)
Jan 5 07:15:05 Server2 consul[24185]: Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false
Jan 5 07:15:05 Server2 consul[24185]: ==> Log data will now stream in as it occurs:
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] raft: Restored from snapshot 951-14971662-1672882915168
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] raft: Initial configuration (index=14974218): [{Suffrage:Voter ID:3e46405f-1031-1120-fbfb-e2d18710ac64 Address:Server2:8300} {Suffrage:Voter ID:6ec8afbf-6d81-00f7-b92c-5794e46a875b Address:Server1:8300} {Suffrage:Nonvoter ID:101f8d8e-3487-b154-62a5-42f83ec535e3 Address:Server3:8300}]
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] raft: Node at Server2:8300 [Follower] entering Follower state (Leader: “”)
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] serf: EventMemberJoin: Server2.first-prod Server2
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [WARN] serf: Failed to re-join any previously known node
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] serf: EventMemberJoin: Server2 Server2
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] serf: Attempting re-join to previously known node: f2aebe5ba646: Server3:8301
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] consul: Adding LAN server Server2 (Addr: tcp/Server2:8300) (DC: first-prod)
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] consul: Handled member-join event for server “Server2.first-prod” in area “wan”
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] consul: Raft data found, disabling bootstrap mode
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] serf: EventMemberJoin: f2aebe5ba646 Server3
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: (LAN) joining: [Server1 Server2 Server3]
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] consul: Adding LAN server f2aebe5ba646 (Addr: tcp/Server3:8300) (DC: first-prod)
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s os packet scaleway soft
layer triton vsphere
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: Joining LAN cluster…
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: (LAN) joining: [Server1 Server2 Server3]
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [WARN] memberlist: Refuting a suspect message (from: Server2)
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] serf: EventMemberJoin: Server1 Server1
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] serf: Re-joined to previously known node: f2aebe5ba646: Server3:8301
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] consul: Adding LAN server Server1 (Addr: tcp/Server1:8300) (DC: first-prod)
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: (LAN) joined: 3 Err:
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: started state syncer
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: (LAN) joined: 3 Err:
Jan 5 07:15:05 Server2 consul[24185]: 2023/01/05 07:15:05 [INFO] agent: Join LAN completed. Synced with 3 initial agents
Jan 5 07:15:05 Server2 consul[24185]: agent: (LAN) joined: 3 Err:
Jan 5 07:15:05 Server2 consul[24185]: agent: Join LAN completed. Synced with 3 initial agents
an 5 07:17:32 Server2 consul[24185]: 2023/01/05 07:17:32 [INFO] raft: aborting pipeline replication to peer {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}
Jan 5 07:17:32 Server2 consul[24185]: raft: aborting pipeline replication to peer {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}
Jan 5 07:17:32 Server2 consul[24185]: 2023/01/05 07:17:32 [ERR] raft: Failed to AppendEntries to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: EOF
Jan 5 07:17:32 Server2 consul[24185]: raft: Failed to AppendEntries to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: EOF
Jan 5 07:17:32 Server2 consul[24185]: 2023/01/05 07:17:32 [ERR] raft: Failed to AppendEntries to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: dial tcp Server2:0->Server1:8300: connect: connection refused
Jan 5 07:17:32 Server2 consul[24185]: raft: Failed to AppendEntries to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: dial tcp Server2:0->Server1:8300: connect: connection refused
Jan 5 07:17:32 Server2 consul[24185]: 2023/01/05 07:17:32 [ERR] raft: Failed to AppendEntries to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: dial tcp Server2:0->Server1:8300: connect: connection refused
Jan 5 07:17:32 Server2 consul[24185]: raft: Failed to AppendEntries to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: dial tcp Server2:0->Server1:8300: connect: connection refused
Jan 5 07:17:32 Server2 consul[24185]: 2023/01/05 07:17:32 [ERR] raft: Failed to AppendEntries to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: dial tcp Server2:0->Server1:8300: connect: connection refused
Jan 5 07:17:32 Server2 consul[24185]: raft: Failed to AppendEntries to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: dial tcp Server2:0->Server1:8300: connect: connection refused
Jan 5 07:17:32 Server2 consul[24185]: 2023/01/05 07:17:32 [WARN] consul: error getting server health from “Server1”: rpc error making call: EOF
Jan 5 07:17:32 Server2 consul[24185]: consul: error getting server health from “Server1”: rpc error making call: EOF
Jan 5 07:17:32 Server2 consul[24185]: 2023/01/05 07:17:32 [ERR] raft: Failed to AppendEntries to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: dial tcp Server2:0->Server1:8300: connect: connection refused
Jan 5 07:17:32 Server2 consul[24185]: raft: Failed to AppendEntries to {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}: dial tcp Server2:0->Server1:8300: connect: connection refused
Jan 5 07:17:32 Server2 consul[24185]: 2023/01/05 07:17:32 [WARN] memberlist: Was able to connect to f2aebe5ba646 but other probes failed, network may be misconfigured
Jan 5 07:17:32 Server2 consul[24185]: memberlist: Was able to connect to f2aebe5ba646 but other probes failed, network may be misconfigured
Jan 5 07:17:32 Server2 consul[24185]: 2023/01/05 07:17:32 [INFO] serf: EventMemberUpdate: Server1
Jan 5 07:17:32 Server2 consul[24185]: serf: EventMemberUpdate: Server1
Jan 5 07:17:33 Server2 consul[24185]: 2023/01/05 07:17:33 [WARN] consul: error getting server health from “Server1”: context deadline exceeded
Jan 5 07:17:33 Server2 consul[24185]: consul: error getting server health from “Server1”: context deadline exceeded
Jan 5 07:17:33 Server2 consul[24185]: 2023/01/05 07:17:33 [WARN] memberlist: Was able to connect to Server1 but other probes failed, network may be misconfigured
Jan 5 07:17:33 Server2 consul[24185]: memberlist: Was able to connect to Server1 but other probes failed, network may be misconfigured
Jan 5 07:17:33 Server2 consul[24185]: 2023/01/05 07:17:33 [INFO] raft: pipelining replication to peer {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}
Jan 5 07:17:33 Server2 consul[24185]: raft: pipelining replication to peer {Voter 6ec8afbf-6d81-00f7-b92c-5794e46a875b Server1:8300}
Jan 5 07:21:44 Server2 consul[24185]: 2023/01/05 07:21:44 [WARN] memberlist: Was able to connect to f2aebe5ba646 but other probes failed, network may be misconfigured
Jan 5 07:21:44 Server2 consul[24185]: memberlist: Was able to connect to f2aebe5ba646 but other probes failed, network may be misconfigured
Jan 5 07:22:07 Server2 consul[24185]: 2023/01/05 07:22:07 [ERR] consul.rpc: unrecognized RPC byte: 255 from=Server3:36436
Jan 5 07:22:07 Server2 consul[24185]: consul.rpc: unrecognized RPC byte: 255 from=Server3:36436

Also, we could not restore the CONSUL snapshot as it is looking for a LEADER and there is not leader elected in our case.

What is the reason for having selected those particular versions? They are neither the latest release, nor the latest patch release in an older release series.


I do not understand, binaries for all versions are available from https://releases.hashicorp.com/


This seems peculiar to me, as the whole point of the -stale flag is to allow replies without a cluster leader.

Unless this version of Consul is so very old, that didn’t work back then? I’m not sure.

Since you’ve been unable to view the current Raft membership this way, you can instead get it from this log line:

Please post the value of this log line, from all three servers.


Er… no, Consul does not restart just because it can’t find a leader.

Restarts must be for another reason.

Please investigate why your Consul processes are being restarted.

Our scans showed that they are outdated and we need to upgrade to the latest versions

Thanks for the link. I am checking with my team mate if we missed checking this link. Thanks a lot again.

The leader from the command Consul Monitor kept changing between server 1,2,3. I will get all the logs from the 3 servers.

Frankly, checked the entire logs and could not see any other reason. I will investigate more.

Thank you all for the help.
we containerized the same on all the 3 nodes by shutting down completely.

now it works fine.

few things we did - Opened NW connection for tcp and udp port 8302