We’re trying to build some dashboards around Nomad telemetry data, specifically interested in allocation stats (cpu/memory allocated, etc) and we’ve run into a bit of a muddle.
First of all, documentation states that telemetry has to be explicitly enabled. We haven’t done so, yet telemetry seems to be available via the /v1/metrics API endpoint. We also haven’t specifically enabled prometheus format metrics, which are listed as false by default yet we also have those available via /v1/metrics?format=prometheus.
Finally and most confusingly, /v1/metrics and /v1/metrics?format=prometheus don’t output the same information. Whereas /v1/metrics makes available a load of different gauges related to CPU/memory allocation for example, /v1/metrics?format=prometheus only makes available very basic metric related to the main Nomad go process itself, mostly heap metrics.
How is all of the above possible, can someone confirm whether we’re being particularly daft about something and missing the very obvious?
Nomad version is 0.10.4
Thanks!
Alex E / Altmetric
Hi @thisisjaid! There’s definitely a mix of odd behavior and documentation issues going on here. I’ve opened https://github.com/hashicorp/nomad/issues/7866 to dig into this some more.
Update 10th May 2020:
I managed to get the allocation metrics (CPU / memory) with proper labels (as job names) from Nomad to Prometheus (to Grafana) by following the tutorial. I had put in the wrong IP.
My second question still stands: why Fabio is needed to enable Prometheus.
Original Query:
We have recently installed Nomad+Consul in our production and are in process of setting up Prometheus. I followed the tutorial exactly as on https://learn.hashicorp.com/nomad/operating-nomad/prometheus-metrics
Question: Can all metrics per allocation (as displayed on Nomad UI for each task) to Prometheus. Right now, it does not seem to be happening. We are using Nomad 0.11 and Consul 1.7.2 and following the tutorial exports only Nomad’s own metrics (how Nomad cluster is doing) and not how tasks running on Nomad are.
I changed the Prometheus config’s consul_sd_configs section to include all services on Consul but that does not help. Ideally I am looking for Nomad to simply expose metrics it displays on UI also to Prometheus for each job/group/task/allocation.
Also, I am not sure why Fabio is needed to enable Prometheus. The tutorials suggests to run fabio on each node without explaining what it is accomplishing.