We need to have something that allows us to monitor each namespace and job and estimate their costs based on our criteria.
Is there a way to do this? I know there are projects like opencost, which may be suitable for this request. Maybe we would have to change a little bit of code to be usable in Nomad.
Thanks in advance
I think you mean “resource allocation utilization”, and you want to monitor current resource utilization in time split into groups.
Observing Prometheus Nomad client metrics, you can plot graphs of allocations resource usages. Take nomad_clients_allocs_cpu_allocated metric and pluck it in Grafana, and you’re done. There should be even examples online. Using Prometheus to Monitor Nomad Metrics | Nomad | HashiCorp Developer , Monitoring Nomad | Nomad | HashiCorp Developer , Dashboards | Grafana Labs
I took one job and drawed this for you:
Since it seems you can set the criteria, all you need to do is enable Prometheus metrics, and apply your criteria to the metric in question.
This is pretty much what we do at $work, but we only look at memory; and basically calculate it along the lines of “cost of a physical node divided by Gb of memory that node has” - this gets us a cost-per-gb for that particular node. We do keep a rolling “average” for this cost per gb across our various nodes, and add a (small) margin on top - this is then used to calculate the cost of a job via said Prometheus metrics. Not 100% accurate (although it could be with some judicious use of node meta info, prometheus metrics, and a few glue scripts) but suffices for our needs.
For CPU you could do something similar since the node info panel shows you the number of compute slices a node provides to the cluster, but then you get into the whole split cpu/memory cost 50/50 across a node, or weigh it differently. Hence, we look at one metric (the one that matters most to us at any rate) to keep things simple.