How to calculate drifting of nodes in serf

Dear All,

I am currently working purely with serf (without consul) and vivaldi network coordinates. My coordinates configuration values are as below;

func DefaultConfig() *Config {
	return &Config{
		Dimensionality:       2,
		VivaldiErrorMax:      1.5,
		VivaldiCE:            0.25,
		VivaldiCC:            0.25,
		AdjustmentWindowSize: 20,
		HeightMin:            10.0e-6,
		LatencyFilterSize:    3,
		GravityRho:           150.0,
	}

I need to testing if the nodes are drifting away indefinitely from the origin as per the vivaldi paper. However, in serf they have introduced “GravityRho” as a solution to stop this drift. I have the following code to check if this is working;;

package main

import (
	"log"
	"os"
	"time"

	"github.com/hashicorp/serf/client"
	"github.com/hashicorp/serf/coordinate"
)

const (
	serfRPCAddr  = "127.0.0.1:7373"
	samplePeriod = 1 * time.Minute
)

func main() {
	// 1. Initialize logging
	nodeLogFile, err := os.OpenFile("serf_node_drift.log", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
	if err != nil {
		log.Fatal(err)
	}
	defer nodeLogFile.Close()
	nodeLogger := log.New(nodeLogFile, "", log.LstdFlags)

	systemLogFile, err := os.OpenFile("serf_system_drift.log", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
	if err != nil {
		log.Fatal(err)
	}
	defer systemLogFile.Close()
	systemLogger := log.New(systemLogFile, "", log.LstdFlags)

	// 2. Connect to Serf
	serfClient, err := client.ClientFromConfig(&client.Config{Addr: serfRPCAddr})
	if err != nil {
		log.Fatal(err)
	}
	defer serfClient.Close()

	// 3. Create origin coordinate EXACTLY as Serf does internally
	config := coordinate.DefaultConfig()
	origin := coordinate.NewCoordinate(config)
	for i := range origin.Vec {
		origin.Vec[i] = 0.0 // Maintain Serf's origin Height (config.HeightMin)
	}
	origin.Adjustment = 0.0

	// 4. Main monitoring loop
	ticker := time.NewTicker(samplePeriod)
	defer ticker.Stop()

	for range ticker.C {
		members, err := serfClient.Members()
		if err != nil {
			log.Printf("Member error: %v", err)
			continue
		}

		var (
			maxDrift    float64
			totalDrift  float64
			activeNodes int
			vecSum      = make([]float64, config.Dimensionality)
			heightSum   float64
			adjustSum   float64
		)

		for _, member := range members {
			if member.Status != "alive" {
				continue
			}

			coord, err := serfClient.GetCoordinate(member.Name)
			if err != nil || coord == nil {
				continue
			}

			// 5. Calculate TRUE drift using Serf's actual method
			drift := coord.DistanceTo(origin).Seconds() * 1000 // ms

			// Log individual node drift
			nodeLogger.Printf("NODE_DRIFT node=%s drift_ms=%.2f vec=%v height=%.6f adj=%.6f",
				member.Name, drift, coord.Vec, coord.Height, coord.Adjustment)

			// Update metrics
			if drift > maxDrift {
				maxDrift = drift
			}
			totalDrift += drift
			activeNodes++

			// Accumulate components for true centroid calculation
			for i := range coord.Vec {
				vecSum[i] += coord.Vec[i]
			}
			heightSum += coord.Height
			adjustSum += coord.Adjustment
		}

		// 6. Calculate system metrics
		if activeNodes > 0 {
			n := float64(activeNodes)
			avgDrift := totalDrift / n

			// Calculate TRUE centroid including all components
			centroidVec := make([]float64, config.Dimensionality)
			for i := range vecSum {
				centroidVec[i] = vecSum[i] / n
			}
			centroidHeight := heightSum / n
			centroidAdjust := adjustSum / n

			// Construct centroid coordinate EXACTLY like real nodes
			centroidCoord := &coordinate.Coordinate{
				Vec:        centroidVec,
				Height:     centroidHeight,
				Adjustment: centroidAdjust,
				Error:      config.VivaldiErrorMax, // Not used in drift calc
			}

			// Calculate centroid drift using Serf's actual distance method
			centroidDrift := centroidCoord.DistanceTo(origin).Seconds() * 1000

			// Log metrics
			systemLogger.Printf("SYSTEM_DRIFT nodes=%d max_ms=%.2f avg_ms=%.2f centroid_ms=%.2f",
				activeNodes, maxDrift, avgDrift, centroidDrift)
		}
	}
}

This code basically does the following;

  • Create an “origin” coordinate (0,0 position) just like Serf internally defines it.
  • Every minute:
  • Get the list of alive Serf members.
  • For each alive node:
    • Get its current Vivaldi coordinate.
    • Calculate the distance (drift) from the origin.
    • Log its drift.
  • After checking all nodes:
    • Calculate max drift, average drift, and the centroid drift.

I see the following results after running this for about 3 days;

2025/04/24 13:39:05 SYSTEM_DRIFT nodes=162 max_ms=58.00 avg_ms=21.38 centroid_ms=14.28
2025/04/24 13:40:05 SYSTEM_DRIFT nodes=162 max_ms=58.00 avg_ms=21.35 centroid_ms=14.33
2025/04/24 13:41:05 SYSTEM_DRIFT nodes=162 max_ms=58.00 avg_ms=21.12 centroid_ms=14.16
2025/04/24 13:42:05 SYSTEM_DRIFT nodes=162 max_ms=58.00 avg_ms=21.11 centroid_ms=14.18
2025/04/24 13:43:05 SYSTEM_DRIFT nodes=162 max_ms=58.00 avg_ms=21.16 centroid_ms=14.19
2025/04/24 13:44:05 SYSTEM_DRIFT nodes=162 max_ms=58.00 avg_ms=21.19 centroid_ms=14.22
2025/04/24 13:45:05 SYSTEM_DRIFT nodes=162 max_ms=58.00 avg_ms=21.22 centroid_ms=14.32
2025/04/24 13:46:05 SYSTEM_DRIFT nodes=162 max_ms=58.00 avg_ms=21.38 centroid_ms=14.41
2025/04/24 13:47:05 SYSTEM_DRIFT nodes=162 max_ms=58.00 avg_ms=21.42 centroid_ms=14.45
2025/04/24 13:48:05 SYSTEM_DRIFT nodes=162 max_ms=58.00 avg_ms=21.40 centroid_ms=14.26
             <some entries have been removed to save space>
2025/04/28 08:29:05 SYSTEM_DRIFT nodes=162 max_ms=131.76 avg_ms=96.10 centroid_ms=95.47
2025/04/28 08:30:05 SYSTEM_DRIFT nodes=162 max_ms=131.76 avg_ms=96.05 centroid_ms=95.41
2025/04/28 08:31:05 SYSTEM_DRIFT nodes=162 max_ms=131.76 avg_ms=96.05 centroid_ms=95.43
2025/04/28 08:32:05 SYSTEM_DRIFT nodes=162 max_ms=131.76 avg_ms=96.17 centroid_ms=95.55
2025/04/28 08:33:05 SYSTEM_DRIFT nodes=162 max_ms=131.76 avg_ms=96.05 centroid_ms=95.43
2025/04/28 08:34:05 SYSTEM_DRIFT nodes=162 max_ms=130.67 avg_ms=96.05 centroid_ms=95.41
2025/04/28 08:35:05 SYSTEM_DRIFT nodes=162 max_ms=130.67 avg_ms=96.08 centroid_ms=95.44
2025/04/28 08:36:05 SYSTEM_DRIFT nodes=162 max_ms=130.67 avg_ms=96.06 centroid_ms=95.43
2025/04/28 08:37:05 SYSTEM_DRIFT nodes=162 max_ms=130.67 avg_ms=95.85 centroid_ms=95.21
2025/04/28 08:38:05 SYSTEM_DRIFT nodes=162 max_ms=130.67 avg_ms=95.86 centroid_ms=95.22

As per the results the max_ms, avg_ms and centroid_ms keeps increasing steadily. I would like to know if this is a normal behavior or a drift is happening here? Also, I would like to know if I am tracking it correctly with this code?

Thank you for any advices and help!