Raft.db maintenance guidance (v1.8.4)

Hi Community!

We’re looking for general guidance on raft.db maintenance - we have two environments (1 NPE/1 PRD both running v1.8.4 across fleet).

In NPE (~20+ clients + ~400 jobs [majority unscheduled batch + service] + 50 allocs) we currently see a range of ~189 MiB-152 MiB for raft.db size across the 3 server members in our cluster.

In PRD (~120+ clients [and counting] + ~400 jobs [majority scheduled batch + service] + 165 allocs) we currently see a range of 1.77 GiB-670 MiB for raft.db size across the 3 server members in our cluster.

Both envs are running server(s) in docker containers and we currently do not see any performance degradation across the control plane.

The only undocumented raft.db “guidance” that we’ve seen with past versions in community or git issues is to reduce/resync raft.db by either (1) restart the leader to force new quorum with compaction and (2) STOP nomad/DEL raft.db from FOLLOWER(S) data dir/START nomad to resync with current leader.

  1. Are the above steps the only recommended methods?
  2. Are there any additional maintenance steps that are less risky in production?
  3. Should there be concern (PRD) that two server node(s) raft.db is 2-3x larger than the smallest?

Please advise - TIA!