Hello,
I’m starting with Prometheus and doing my first steps, as I used before InfluxDB / Telegraf. I’ve created a test env with a Nomad Cluster autodiscovery via Consul and try to get metrics from Nginx, via Nginx-Promtheus-exporter.
I’ve created a job, where I can scrape metrics from the Nginx Job(s). Prometheus uses consul_sd and tags for finding the metrics. It works so far:
- job_name: nginx_metrics
consul_sd_configs:
- server: https://fra-test-prom-01.example.local:8501
token: ''
datacenter: fra
tags:
- nginx_metrics
relabel_configs:
- source_labels:
- __meta_consul_service
action: replace
target_label: "app"
scheme: http
metrics_path: "/metrics"
scrape_interval: 10s
scrape_timeout: 10s
and I can see the metrics on the Nginx Dashboard on Grafana, but the issue I see is … you can see this in the picture below:
So every failed job is also listed (issues on the repository), because it found it way into the Consul Service, which then was scaped by Prometheus … but while writing these lines, I know the reason: the Nginx exporter does not run as sidecar, but as regular task. This had / has the side effect, that the metric container was up (because image are from Dockerhub), but the custom image from the private registry had an issue. /todo → Change config to sidecar
Anyway … I’m pretty interested, how you configured your jobs / Prometheus configs to get metrics and drop data from already replaced jobs / containers. I mean a container lives only a few days / weeks, until it gets replaced with a new version, so “instance < Nomad host / Nomad Port>” gets unimportant.
Update:
I’ve added to my metrics task:
lifecycle {
hook = "poststart"
sidecar = true
}
to avoid the issue, scrape metrics from a not working container.
cu denny