I am looking into metrics and so far there is lots of confusion.
Here are some questions I got:
-
what is the difference between “system space” and “user space”, specifically metrics
nomad.client.allocs.cpu.user
andnomad.client.allocs.cpu.system
. How do I know if my job is in system or in user? Do I need to sum them to get the cpu consumption by allocation? -
nomad.client.allocs.cpu.allocated
in in Mhz, but all other allocs.cpu metrics are in Percentage, according to documentation. How can I build an alarm that triggers when allocation CPU usage crosses available allocation CPU? This would be an indicator that task consumes more CPU then what was given. -
I was expecting that:
nomad.client.host.cpu.total_percent = nomad.client.host.cpu.system + nomad.client.host.cpu.user
but on my single core cpu it is never the case. Why is it like that? What am I missing? -
I expected that:
sum of allnomad_client_allocs_memory_usage
per allocation would equalnomad_client_host_memory_used
, but it is also never the case. If I add to the calculation allocated memory per allocation, it also doesn’t add up really. -
What metric can I use to see nomad internal client/host processes CPU/memory usage?