Homelab - Consul raft failure log not found. Log location clarity

I’m working on a homelab setup, using 3 raspberry pi’s
I’ve had this problem from consul versions 1.5 all the way to 1.9 and I’m trying to understand it.

The consul cluster will randomly crash (could be days could be months, I’m assuming this is some sort of sync problem) and the 3 node cluster consul will refuse to start, each one giving the same error.

agent.server.raft: failed to get log: index=49194 error=“log not found”"

(index value obviously changes)

From reading the docs I believe understand what’s happening here, it’s trying to do a log replay to return to a known state across the nodes, however it can’t find the log containing the index to replay from.

I can work around this as it’s only a home lab play cluster by deleting the raft state and the cluster reforms and come back to life.

What I’m not clear on is

a.) the default location name of the log file that contains the data to be able to reform from it’s state
b.) what consul parameter sets the location/file that contains the raft state

I think I understand why this is happening in my homelab, but I need to understand where/what it’s looking for to be able to confirm and I can’t find the documentation that explains the file / location in the process.

thanks

Hi @ikonia,

The details you are looking for belong to the internals of the Raft Protocol. Here are the answers for your questions on a high level.

  1. Where is the log file?

In this case, the log file is the raft.db file inside each server agent’s Consul data directory. Every write request (state change) to the cluster that ends up on the Leader would first get written in the Leader’s raft.db, which will be replicated to the followers over RPC. Every follower will write the entry to its raft.db and finally write it to an in-memory DB.

This is what is explained here:

  • The durable storage here is raft.db

Once a cluster has a leader, it is able to accept new log entries. A client can request that a leader append a new log entry (from Raft’s perspective, a log entry is an opaque binary blob). The leader then writes the entry to durable storage and attempts to replicate to a quorum of followers. Once the log entry is considered committed , it can be applied to a finite state machine. The finite state machine is application specific; in Consul’s case, we use MemDB to maintain cluster state. Consul’s writes block until it is both committed and applied . This achieves read after write semantics when used with the consistent mode for queries.
ref: Consensus Protocol | Raft | Consul | HashiCorp Developer

  1. The configuration that sets the path

The path to the raft.db is hardcoded to be <data-dir>/raft/raft.db. It can’t be modified.

  1. Why would a log entry go missing?

The short answer is that the raft.db will be compacted by removing older entries. This itself won’t cause the log missing issue, though.

Obviously, it would be undesirable to allow a replicated log to grow in an unbounded fashion. Raft provides a mechanism by which the current state is snapshotted and the log is compacted. Because of the FSM abstraction, restoring the state of the FSM must result in the same state as a replay of old logs. This allows Raft to capture the FSM state at a point in time and then remove all the logs that were used to reach that state. This is performed automatically without user intervention and prevents unbounded disk usage while also minimizing time spent replaying logs. One of the advantages of using MemDB is that it allows Consul to continue accepting new transactions even while old state is being snapshotted, preventing any availability issues.

One scenario where you would see the log missing error is as explained below.

Lets say you have 3 server agents. Data is constantly replicated between them. Lets say for some reason one node is dead. The cluster would continue to function but the data is not replicated to this failed follower.

Let us assume that before the node died, all the agents had replicated up to index 100. Now continuous writes are happening and the index reached 200. Let us assume that this is when compaction happened on the leader, and now the leaders raft db has index 150 to 200.

Now let us assume the failed node has come back. At this stage, the node would join the cluster, and the leader would send index 201 to the recovered node. But we know that the last index is 100 for this node, and it would ask for index 101 to the leader. At this point, leader would try to replay logs from 101 to 201 from its raft.db, but as the raft.db had already been compacted, the leader would through the error saying that log not found.

But this is still ok, the leader would now send a latest snapshot of the memdb (stored in <data-dir>/raft/snapshots/). The recovered node will consume this snapshot and then be ready to receive data from latest index the snapshot that it received.

I hope this gave you some direction to explore. There could be more such scenarios, but in general raft is smart enough to recover. I am unsure what happened with your cluster that they refused to start.

The following content might help you to understand this better.

1 Like

this is a fantastically clear and helpful post, and I really appreciate it.

I had wondered if it was referencing the raft.db but the messages I’d read didn’t support that in the language, so being clear on that is superb.

It’s also managed to remove the idea I had of what I thought was going wrong, so that’s allowed me to not waste time looking at the wrong thing, at the same time you’ve given me some other pointers on what could be happening.

Superb post, really appreciated

1 Like

I would also recommend reading these two sections in the official documentation (I didn’t know that this was added to the official docs :slight_smile: ). They explain a similar scenario related to snapshot install loop.

1 Like