PromQL queries with telemetry

Hi everyone,
I’ve been trying to setup a grafana dashboard to show the telemetry data of nomad client nodes. But I am not sure of the correctness of my PromQL queries. I’ve followed the metric reference document and followed some references but I am still not sure about the correctness because the cpu/memory stats shown on the nomad UI is different from my grafana panels

nomad-version:

Nomad v1.8.2
BuildDate 2024-07-16T08:50:09Z
Revision 7f0822c1e4f25907d9f60e2d595411950dd1bd28

Nomad-agent telemetry block:

telemetry {
  collection_interval        = "1s"
  disable_hostname           = true
  prometheus_metrics         = true
  publish_allocation_metrics = true
  publish_node_metrics       = true
}

I am using victoriametrics to store the prometheus formatted data pushed by nomad.

Can someone help me to verify whether these queries are correct or not.

For future reference, these are dahsboard variables:

  • cluster: Only one option local
  • instance: specifies the client node

Queries

1. Current cpu utilization of client node

sum by (cluster,instance) (nomad_client_host_cpu_total{cluster="$cluster",instance="$client_id"}) / (sum by (cluster, instance) (nomad_client_host_cpu_total{cluster="$cluster",instance="$client_id"}) + sum by (cluster, instance) (nomad_client_host_cpu_idle{cluster="$cluster",instance="$client_id"}))

2. Current memory utilization of client node

sum by (cluster, instance) (nomad_client_host_memory_used{cluster="$cluster", instance="$client_id"}) / (sum by (cluster, instance) (nomad_client_host_memory_total{cluster="$cluster", instance="$client_id"}))

3. Current Disk utilization of client node

sum by (cluster, instance) (nomad_client_host_disk_used{cluster="$cluster", instance="$client_id"}) / (sum by (cluster, instance) (nomad_client_host_disk_size{cluster="$cluster", instance="$client_id"}))

4. % CPU shares allocated
This specifies how much cpu is allocated to all the jobs combined

sum by (cluster, instance) (nomad_client_allocated_cpu{cluster="$cluster", instance="$client_id"}) / (sum by (cluster, instance) (nomad_client_allocated_cpu{cluster="$cluster", instance="$client_id"}) + sum by (cluster, instance) (nomad_client_unallocated_cpu{cluster="$cluster", instance="$client_id"}))

5. % CPU utilization
This specifies how much cpu is being utilized out of allocated to all the jobs combined

sum by (cluster,instance) (nomad_client_allocs_cpu_total_ticks{cluster="$cluster",instance="$client_id"}) / sum by (cluster, instance) (nomad_client_allocs_cpu_allocated{cluster="$cluster",instance="$client_id"})

Few refs: ref1, ref2
6. CPU allocated (MHz)
This is the CPU allocated to all the jobs.

sum by (cluster, instance) (nomad_client_allocated_cpu{cluster="$cluster", instance="$client_id"})

7. CPU utilization (Mhz)
This is the CPU being utilized by all the jobs

sum by (cluster,instance) (nomad_client_allocs_cpu_total_ticks{cluster="$cluster",instance="$client_id"})

8. % Memory allocated

sum by (cluster, instance) (nomad_client_allocated_memory{cluster="$cluster", instance="$client_id"}) / (sum by (cluster, instance) (nomad_client_allocated_memory{cluster="$cluster", instance="$client_id"}) + sum by (cluster, instance) (nomad_client_unallocated_memory{cluster="$cluster", instance="$client_id"}))

9. % Memory utilization

sum by (cluster, instance)(nomad_client_allocs_memory_usage{cluster="$cluster", instance="$client_id"}) / sum by (cluster, instance)(nomad_client_allocs_memory_allocated {cluster="$cluster", instance="$client_id"})

10. Memory allocated (Bytes)
This is the memory allocated to all the jobs combined

sum by (cluster, instance) (nomad_client_allocs_memory_allocated{cluster="$cluster", instance="$client_id"})

11. Memory Utilization (Bytes)
This is the memory being utilized by all the jobs combined

sum by (cluster,instance) (nomad_client_allocs_memory_usage	{cluster="$cluster",instance="$client_id"})

I am finding discrepancy in the 1st and 2nd query.