A node doesnt upate its db

Hello,

I’m actually working on a 3 nodes infrastructure with raft storage.

I just figured out a weird thing, one of my node stopped to update its database :

Here is the db of the main and 1 node that is in standby mode :

-rw------- 1 vault_1 vault_1  12M Jan 18 10:40 vault.db
-rw------- 1 vault_2 vault_2 14M Jan 18 10:40 vault.db

And here is the node that doesn’t update anymore :

-rw------- 1 vault_3 vault_3 9.7M Jan 5 16:28 vault.db

Maybe someone has an idea about the reason of this issue.

Another weird detail why does the db on the 2 first nodes don’t have the same size ?

check status with ‘vault operator members’ and ‘vault operator raft list-peers’

Well actually I can see that my node is in the cluster :

{
  "request_id": "e6fbf969-3165-f225-0e4e-f08ee1747992",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "config": {
      "servers": [
        {
          "node_id": "node2",
          "address": "localhost:8201",
          "leader": false,
          "protocol_version": "3",
          "voter": true
        },
        {
          "node_id": "node3",
          "address": "localhost:8201",
          "leader": false,
          "protocol_version": "3",
          "voter": true
        },
        {
          "node_id": "node1",
          "address": "localhost:8201",
          "leader": true,
          "protocol_version": "3",
          "voter": true
        }
      ],
      "index": 0
    }
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

That’s actually why I don’t understand why this node stopped to update its database, maybe I should stop it and restart it ?

I’m sorry, I don’t know why. Perhaps you can try an issue on Github.
Nothings in logs ? Try to restart your node and test TCP connectivity.

I have a 3 nodes too, I don’t have this behaviour.
But I think I will add a Prometheus/Grafana about this.

Okay " good news"

After trying to restart my node I saw that I had weird log error messages, and this article explains how to solve it :

It seems like the db of my node was corrupted.
After following the explanation of the article my issue is solved.

Is there a way to “know” when this happens ?
For my case I realized a bit randomly after checking if the db on the nodes were coordinated, but maybe there is better way ?

You got no log about the problem before restart ?

As I said, I will add a monitoring on the db file and check modification time. if there is a big gap between all nodes, it will send an alert.

1 Like

I didn’t checked them before… ( that was not the smartest thing ever from me, I thougth about it when restarting it)

Okay I think I will do the same, thanks for your answers have a good week :slight_smile: