Monitoring Nomad clusters by region


We are in the process of migrating to Nomad and for the most part it’s been really smooth and great. We are currently running into an issue with building out our monitoring of Nomad though.

Our setup is as follows:

  • 3 regions
    • Each region is a cloud region
    • Datacenters are cloud availability zones
  • Collecting metric via the Prometheus telemetry endpoint
telemetry {
  collection_interval = "1s"
  disable_hostname = true
  prometheus_metrics = true
  publish_allocation_metrics = true
  publish_node_metrics = true

When we get the metrics into Prometheus several of the metrics have the datacenter label, however none of them have a label for the region.

So we can build dashboards around Nomad datacenters, but not regions.

Are we missing something? To us it feels very important to be able to group metrics by region and we don’t see a way currently to do that. Well we have a bit of a hack with Grafana variables, regex, and nasty stuff. However we would much prefer metrics actually having the region label.

Hi @regner,

I believe the expectation here is that the region identifier is left to the scraper in the case of Prometheus. This could be via the job_name or be adding the region identifier within a labels config option.

Are you able to share a snippet of your scrape config or details of how you’re discovering the agents to scrape?

jrasell and the Nomad team

Hey @jrasell,

Thank you very much for responding and any help you’re able to offer, it’s greatly appreciated.

We are using the Grafana Agent binary (GitHub - grafana/agent: Telemetry agent for the LGTM stack.), which is essentially a fork of Prometheus that has been slimmed down. We deploy the agent to all nodes, it queries locally, and then does a remote write to Grafana cloud hosted Prometheus.

We manage our infrastructure with Ansible, which includes deploying and configuring Grafana Agent. Our scrape config for Nomad essentially looks like this:

        - job_name: "integrations/nomad"
          metrics_path: "/v1/metrics"
              - "prometheus"
            - targets:
              - "{{ ansible_host }}:4646"
            - replacement: "{{ ansible_host }}"
              target_label: "instance"

Ansible replaces {{ ansible_host }} with the local hostname, and only adds that config to agents that are running the Nomad service.

We could easily add an extra label that is the region in which the intance is running. It just seemed odd that Nomad metrics wouldn’t already include that information.

Hi @regner,

I talked about this internally to make sure my context was correct, and Nomad doesn’t export region as a label because the library we use doesn’t support adding default labels to every metric. Without this, we would need to plumb through the region identifier to every corner of the code which exports metrics.

Apologies for the inconvenience, however, I hope this makes sense.

jrasell and the Nomad team

1 Like

Thank you very much for raising it internally and updating with why it isn’t done. Much appreciated.

1 Like