Prometheus / Consul autodiscovery / short-live containers -> searching for examples

linuxmail · February 2, 2024, 9:47am

Hello,

I’m starting with Prometheus and doing my first steps, as I used before InfluxDB / Telegraf. I’ve created a test env with a Nomad Cluster autodiscovery via Consul and try to get metrics from Nginx, via Nginx-Promtheus-exporter.

I’ve created a job, where I can scrape metrics from the Nginx Job(s). Prometheus uses consul_sd and tags for finding the metrics. It works so far:

- job_name: nginx_metrics
  consul_sd_configs:
  - server: https://fra-test-prom-01.example.local:8501
    token: ''
    datacenter: fra
    tags:
      - nginx_metrics
  relabel_configs:
   - source_labels:
     - __meta_consul_service
     action: replace
     target_label: "app"
  scheme: http
  metrics_path: "/metrics"
  scrape_interval: 10s
  scrape_timeout: 10s

and I can see the metrics on the Nginx Dashboard on Grafana, but the issue I see is … you can see this in the picture below:

So every failed job is also listed (issues on the repository), because it found it way into the Consul Service, which then was scaped by Prometheus … but while writing these lines, I know the reason: the Nginx exporter does not run as sidecar, but as regular task. This had / has the side effect, that the metric container was up (because image are from Dockerhub), but the custom image from the private registry had an issue. /todo → Change config to sidecar

Anyway … I’m pretty interested, how you configured your jobs / Prometheus configs to get metrics and drop data from already replaced jobs / containers. I mean a container lives only a few days / weeks, until it gets replaced with a new version, so “instance < Nomad host / Nomad Port>” gets unimportant.

Update:

I’ve added to my metrics task:

      lifecycle {
        hook = "poststart"
        sidecar = true
      }

to avoid the issue, scrape metrics from a not working container.

cu denny

matthias · February 2, 2024, 1:06pm

Hi,

you should use the “services” option to find the prom endpoints in Consul.

Something like this in prometheus.yaml to fetch the metrics from my Traefik instance:

  - job_name: 'traefik'
    consul_sd_configs:
    - server: 'consul.service.consul:8500'
      services: ['traefik-api']
    relabel_configs:
    - source_labels: [__address__, __meta_consul_service_metadata_metrics_port]
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: ${1}:${2}
      target_label: __address__

… and in the Traefik job file:

network {
  port "metrics" { to = 8080 } # Prometheus metrics via API port
}

service {
      meta {
        metrics_port = "${NOMAD_HOST_PORT_metrics}"
      }
}

linuxmail · February 5, 2024, 9:20am

hi @matthias

is there any difference, between using TAGs and services ? I had before a reason, why I used TAG … but I can use also services.

Update
I know again, the reason, why I’ve choosen TAG instead using services: for the services, I would have to add all services, which has the Nginx metrics and this needs a change on Puppet config to update the Promehteus config. But if I use TAGs, that I just need to keep Nomad files correct.

But besides from the “how I find my metrics”, is there anything, which is good to know ?

cu denny

ps. Thanks for the reply

matthias · February 5, 2024, 2:33pm

Agree, if you want to scrape different services with the same scraping config, tags should be the way to go. Just as a note, “services” is an array, therefore you could list more than one service here.

From my experience, every service has their own quirks, which is why I usually use one scrape config per service.

Not using puppet, sorry.

Topic		Replies	Views
Monitoring sidecar and gateways using prometheus Consul k8s	5	1646	November 4, 2021
Prometheus now supports Nomad native service discovery Nomad	5	835	December 10, 2022
Spring boot actuator metrics for Prometheus in Consul Connect Consul connect , prometheus	1	1548	September 20, 2021
Using Prometheus to Monitor Nomad Metrics without Consul Nomad	2	353	May 24, 2023
Nomad service discovery, find client instances Nomad	1	171	March 28, 2024

Prometheus / Consul autodiscovery / short-live containers -> searching for examples

Related topics