hi!
what is the difference between “system space” and “user space”, specifically metric
The difference is as on linux. See like User CPU time vs System CPU time? - Stack Overflow . Reseach linux cpu usage metrics.
but all other allocs.cpu metrics are in Percentage
What about nomad.client.allocs.cpu.total_ticks
?
How can I build an alarm that triggers when allocation CPU usage crosses available allocation CPU?
I use prometheus, when the following is greater than 100%:
nomad_client_allocs_cpu_total_ticks{namespace=~"$namespace",instance=~"$client",exported_job=~${job:doublequote},task_group=~"$group",task=~"$task",alloc_id=~"$alloc_id"} * 100
/
nomad_client_allocs_cpu_allocated{namespace=~"$namespace",instance=~"$client",exported_job=~${job:doublequote},task_group=~"$group",task=~"$task",alloc_id=~"$alloc_id"}
Why is it like that? What am I missing?
See Linux CPU usage metrics. This is nothing specific to Nomad. See man proc
, see /proc/stat
documentation.
it also doesn’t add up really.
What about the kernel? What about I/O device buffers? Consider researching Linux memory.
What metric can I use to see nomad internal client/host processes CPU/memory usage?
I do not understand the question, what is “internal client” and “internal host” processes, and how do they differ from “external”? You might be interested in Zabbix or prometheus or nagios.
To monitor go process “internal” (i.e. metrics package - runtime/metrics - Go Packages) of the Nomad process itself, I use nomad_runtime_alloc_bytes
and nomad_runtime_heap_objects
.