Number of nodes as seen by scheduler

Hi!

I would like to monitor the number of nodes as seen by the Nomad Scheduler (both other schedulers and agents) to detect any network level issues in connectivity between Schedulers and Agents. Is there a metric that I can use to that end?

Looking at https://developer.hashicorp.com/nomad/docs/operations/metrics-reference#client-metrics it is quite easy to get the number of nodes which report metrics, but I can’t find anything about the state of the cluster from the point of view of the scheduler.

Help?

Hi @stswidwinski,

I believe the best metric to use for this would be nomad.heartbeat.active which describes how many active client heartbeat trackers are running. Each client within the Nomad cluster will have a heartbeat tracker for health tracking, and therefore changes to this number would indicate when clients become healthy/unhealthy from the Nomad servers perspective.

Thanks,
jrasell and the Nomad team

Great, I think this checks all the boxes. Thank you!