I can't scrape Nomad cluster metrics and forward to Prometheus server

Hi there,

I’ve spend more time than I would expect on trying to forward our exposed metrics to a Prometheus server. We use Nomad autodiscovery along with Traefik and Cloudflare for DNS management. Currently our cluster has telemetry options on, and I can see the metrics exposed to cluster.example.com/v1/metrics?format=prometheus

To my understanding we need to discover the metrics per client/server and then forward them to Prometheus. I think that deploying a standalone Prometheus server might be an overkill since we just need to forward the metrics.

I’ve successfully deployed a Vector job according to this post from the community: Nomad host logs and metrics using vector, Loki (Grafana cloud)

but as far as I can tell there are obviously some metrics missing since vector cannnot discover the metrics exposed by the Nomad cluster.

According to my understanding, if I want to use an agent I can move along with the victoria-metrics agent or the grafana agent.

So, I’ve also tried to deploy the victoria-metrics agent as a job with the Nomad autodiscovery configuration on. When the job gets deployed, the agent actually discovers the membeers of the cluster but when trying to scrape the metrics it fails with a 404 error.

I am pulling my hair apart cause I’ve been trying different configurations for the past 2 days and I can’t seem to find a solution. It seems that the agent is trying to scrape the metrics from raw IP addresses which are not accessible.

Here is the job definition: as a reference:

job "vmagent" {
  datacenters = ["dc1"]
  type        = "service"

  group "vmagent" {
    network {
      port "api" {
        to = 8429

    ephemeral_disk {
      size   = 500 # 500 MB
      sticky = true

    update {
      auto_revert = true

    task "vmagent" {
      driver = "docker"
      config {
        image = "victoriametrics/vmagent:latest"
        ports = ["api"]
        args = [

      template {
        data        = file(abspath("./prometheus.tpl.yml"))
        destination = "local/vm-confing.yml"
        change_mode = "restart"

      service {
        provider = "nomad"
        port     = "api"

      resources {
        cpu    = 256
        memory = 100


and the template file:

  scrape_interval: 2s
  evaluation_interval: 2s

  - job_name: "nomad_test"
      - labels: { "cluster": "foo" }
      - server: "https://my-nomad-dashboard.com"
          credentials: "<nomad-token>"
        follow_redirects: true
        refresh_interval: 1m
          insecure_skip_verify: true
    metrics_path: /v1/metrics
    # params:
    #   format: ["prometheus"]
    scrape_interval: 15s
    scrape_timeout: 5s
      - source_labels: [__address__]s
        target_label: environment
        replacement: "staging"