Monitoring tasks/service with prometheus

lopz · May 2, 2023, 11:06pm

Good night,
I am looking for some way to be able to monitor the tasks, what I really want is that when a container restarts I can report with an alarm, my stack is prometheus+alartmanager, I have everything integrated but I can’t find the indicated metrics/labels that it can give me when a service was restarted, before I was monitoring the services using systemctl, but now it’s all containerized and orchestrated by nomad, any ideas?
Similar topic without response (How to monitor restarts with nomad_client_alloc_restarts)

hector.medina.cabane · May 3, 2023, 7:56am

This is interesting, I’m also interested in this use case. The point is that if the docker reboot itself I don’t think this creates a new allocation, that’s why in nomad’s viewpoint nothing has changed. (If I’m not wrong).

The only path to get this info would be to extract this info from docker. This might be possible with a cAdvisor job and use the container_start_time_seconds metric for firing an alert when it’s 0 or under 1 minute let’s say.

github.com

google/cadvisor/blob/master/docs/storage/prometheus.md

# Monitoring cAdvisor with Prometheus

cAdvisor exposes container and hardware statistics as [Prometheus](https://prometheus.io) metrics out of the box. By default, these metrics are served under the `/metrics` HTTP endpoint. This endpoint may be customized by setting the `-prometheus_endpoint` and `-disable_metrics` or `-enable_metrics` command-line flags.

To collect some of metrics it is required to build cAdvisor with additional flags, for details see [build instructions](../development/build.md), additional flags are indicated in "additional build flag" column in table below.

To monitor cAdvisor with Prometheus, simply configure one or more jobs in Prometheus which scrape the relevant cAdvisor processes at that metrics endpoint. For details, see Prometheus's [Configuration](https://prometheus.io/docs/operating/configuration/) documentation, as well as the [Getting started](https://prometheus.io/docs/introduction/getting_started/) guide.

# Examples

* [CenturyLink Labs](https://labs.ctl.io/) did an excellent write up on [Monitoring Docker services with Prometheus +cAdvisor](https://www.ctl.io/developers/blog/post/monitoring-docker-services-with-prometheus/), while it is great to get a better overview of cAdvisor integration with Prometheus, the PromDash GUI part is outdated as it has been deprecated for Grafana.

* [vegasbrianc](https://github.com/vegasbrianc) provides a [starter project](https://github.com/vegasbrianc/prometheus) for cAdvisor and Prometheus monitoring, alongide a ready-to-use [Grafana dashboard](https://github.com/vegasbrianc/grafana_dashboard).

## Prometheus container metrics

The table below lists the Prometheus container metrics exposed by cAdvisor (in alphabetical order by metric name) and corresponding `-disable_metrics` / `-enable_metrics` option parameter:

Metric name | Type | Description | Unit (where applicable) | option parameter | additional build flag |
:-----------|:-----|:------------|:------------------------|:---------------------------|:----------------------

This file has been truncated. show original

I don’t know if there is a better way to achieve this, but I think this would make the trick. I will test myself when I have some time for this.

hector.medina.cabane · May 4, 2023, 10:31am

By the way, I have just setup cadvisor as a system job and I successfully send metrics to Grafana Cloud to monitor when the docker containers are restarted. So this would be a suitable solution.

If anyone have a better solution just let me know, please. I might open an issue to export these metrics in the next version of nomad native telemetry.

I have stop the apache2 container (not the job), so nomad noticed the apache2 job was running but there was not actually any container running, so nomad launched another container. That’s why it says 0days for apache2, previous value was 6 days, the same as apache job. (This is my home lab with a raspberry pi)

I just wanted to point out that the container_start_time_seconds metric stores the timestamp when the container started with epoch time.

This is the dashboard I’m using Cadvisor exporter | Grafana Labs

lopz · May 22, 2023, 12:59pm

Hey,

I want to share the way to achieve send alert where one container restarted, I used the following PQL last_over_time(nomad_client_allocs_restart[5m]) > 0

Topic		Replies	Views
How to monitor restarts with nomad_client_alloc_restarts Nomad prometheus	0	580	November 15, 2022
Add additional labels to nomad allocs telemetry Nomad prometheus	2	1269	August 11, 2020
Nomad Prometheus Nomad	2	358	March 25, 2022
Container starts then immediately stops Nomad	0	1063	December 22, 2022
Nomad telemetry Nomad	3	1064	May 9, 2020

Monitoring tasks/service with prometheus

Related topics