Prometheus / Consul autodiscovery / short-live containers -> searching for examples

Hello,

I’m starting with Prometheus and doing my first steps, as I used before InfluxDB / Telegraf. I’ve created a test env with a Nomad Cluster autodiscovery via Consul and try to get metrics from Nginx, via Nginx-Promtheus-exporter.

I’ve created a job, where I can scrape metrics from the Nginx Job(s). Prometheus uses consul_sd and tags for finding the metrics. It works so far:

- job_name: nginx_metrics
  consul_sd_configs:
  - server: https://fra-test-prom-01.example.local:8501
    token: ''
    datacenter: fra
    tags:
      - nginx_metrics
  relabel_configs:
   - source_labels:
     - __meta_consul_service
     action: replace
     target_label: "app"
  scheme: http
  metrics_path: "/metrics"
  scrape_interval: 10s
  scrape_timeout: 10s

and I can see the metrics on the Nginx Dashboard on Grafana, but the issue I see is … you can see this in the picture below:

So every failed job is also listed (issues on the repository), because it found it way into the Consul Service, which then was scaped by Prometheus … but while writing these lines, I know the reason: the Nginx exporter does not run as sidecar, but as regular task. This had / has the side effect, that the metric container was up (because image are from Dockerhub), but the custom image from the private registry had an issue. /todo → Change config to sidecar

Anyway … I’m pretty interested, how you configured your jobs / Prometheus configs to get metrics and drop data from already replaced jobs / containers. I mean a container lives only a few days / weeks, until it gets replaced with a new version, so “instance < Nomad host / Nomad Port>” gets unimportant.

Update:

I’ve added to my metrics task:

      lifecycle {
        hook = "poststart"
        sidecar = true
      }

to avoid the issue, scrape metrics from a not working container.

cu denny

Hi,

you should use the “services” option to find the prom endpoints in Consul.

Something like this in prometheus.yaml to fetch the metrics from my Traefik instance:

  - job_name: 'traefik'
    consul_sd_configs:
    - server: 'consul.service.consul:8500'
      services: ['traefik-api']
    relabel_configs:
    - source_labels: [__address__, __meta_consul_service_metadata_metrics_port]
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: ${1}:${2}
      target_label: __address__

… and in the Traefik job file:

network {
  port "metrics" { to = 8080 } # Prometheus metrics via API port
}

service {
      meta {
        metrics_port = "${NOMAD_HOST_PORT_metrics}"
      }
}

hi @matthias

is there any difference, between using TAGs and services ? I had before a reason, why I used TAG … but I can use also services.

Update
I know again, the reason, why I’ve choosen TAG instead using services: for the services, I would have to add all services, which has the Nginx metrics and this needs a change on Puppet config to update the Promehteus config. But if I use TAGs, that I just need to keep Nomad files correct.

But besides from the “how I find my metrics”, is there anything, which is good to know ?

cu denny

ps. Thanks for the reply :slight_smile:

Agree, if you want to scrape different services with the same scraping config, tags should be the way to go. Just as a note, “services” is an array, therefore you could list more than one service here.

From my experience, every service has their own quirks, which is why I usually use one scrape config per service.

Not using puppet, sorry.

1 Like