Hanska
January 18, 2023, 10:44am
1
Hello,
I’m actually working on a 3 nodes infrastructure with raft storage.
I just figured out a weird thing, one of my node stopped to update its database :
Here is the db of the main and 1 node that is in standby mode :
-rw------- 1 vault_1 vault_1 12M Jan 18 10:40 vault.db
-rw------- 1 vault_2 vault_2 14M Jan 18 10:40 vault.db
And here is the node that doesn’t update anymore :
-rw------- 1 vault_3 vault_3 9.7M Jan 5 16:28 vault.db
Maybe someone has an idea about the reason of this issue.
Another weird detail why does the db on the 2 first nodes don’t have the same size ?
check status with ‘vault operator members’ and ‘vault operator raft list-peers’
Hanska
January 23, 2023, 7:31am
3
Well actually I can see that my node is in the cluster :
{
"request_id": "e6fbf969-3165-f225-0e4e-f08ee1747992",
"lease_id": "",
"renewable": false,
"lease_duration": 0,
"data": {
"config": {
"servers": [
{
"node_id": "node2",
"address": "localhost:8201",
"leader": false,
"protocol_version": "3",
"voter": true
},
{
"node_id": "node3",
"address": "localhost:8201",
"leader": false,
"protocol_version": "3",
"voter": true
},
{
"node_id": "node1",
"address": "localhost:8201",
"leader": true,
"protocol_version": "3",
"voter": true
}
],
"index": 0
}
},
"wrap_info": null,
"warnings": null,
"auth": null
}
That’s actually why I don’t understand why this node stopped to update its database, maybe I should stop it and restart it ?
I’m sorry, I don’t know why. Perhaps you can try an issue on Github.
Nothings in logs ? Try to restart your node and test TCP connectivity.
I have a 3 nodes too, I don’t have this behaviour.
But I think I will add a Prometheus/Grafana about this.
Hanska
January 23, 2023, 9:50am
5
Okay " good news"
After trying to restart my node I saw that I had weird log error messages, and this article explains how to solve it :
It seems like the db of my node was corrupted.
After following the explanation of the article my issue is solved.
Is there a way to “know” when this happens ?
For my case I realized a bit randomly after checking if the db on the nodes were coordinated, but maybe there is better way ?
Joffrey
January 23, 2023, 11:08am
6
You got no log about the problem before restart ?
As I said, I will add a monitoring on the db file and check modification time. if there is a big gap between all nodes, it will send an alert.
1 Like
Hanska
January 23, 2023, 11:32am
7
I didn’t checked them before… ( that was not the smartest thing ever from me, I thougth about it when restarting it)
Okay I think I will do the same, thanks for your answers have a good week