How to loop across tasks in nomad group?

elliott.gel · March 29, 2024, 8:40am

Metadata

Nomad v1.7.2
OS: Windows
Scheduler: service
Driver: exec / raw_exec

Architecture

I have 10 windows servers running nomad client nodes.

I want to run 3 instances of a stateful service using Nomad. Each instance (01, 02, 03) writes checkpoints and data to filesystem (eg. /tmp/instance01, /tmp/instance02, /tmp/instance03). When the instance restarts, it will continue from the latest checkpoint. Each instance can be allocated to any host. However, each instance should be configured to use the same directory as the previously failed instance.

So basically:

01 ↔ /tmp/instance01
02 ↔ /tmp/instance02
03 ↔ /tmp/instance03

For simplicity, assume these 3 directories are created in NAS, and the NAS is mounted on all servers running nomad client node. Also assume all groups / tasks has RW access to these 3 directories.

Issue

There are a few ways I can configure the directory that the service uses to read/write state data:

Command Line Argument
Configuration File, via environment variable
Template block, via Nomad template interpolation

How can I pass a different value to the same task running in different instances of a group?

i.e. How do I give each instance a unique and consistent tag, so it can be reliably identified?

Use Case:

Grafana Loki Read-Node and Backend-Node writes checkpoints and data to filesystem before pushing to downstream. Each node maintain it’s own checkpoint.
Opentelemetry writes checkpoints and data to filesystem when memory is full and exporting to downstream has failed. Each node maintain its own checkpoint.

I suspect many other application stores instance-state-data on disk, where the state store are unique to each instance, and is use to recover work for that instance.

What I’ve considered

Parameterized Block → Does not work for Service jobs
Template Block → Every task instance will receive the same data
Env Var → Every task instance will receive the same data
Meta Block → Every task instance will receive the same data
Variable Block → Every task instance will receive the same data
Dynamic Block → Possible solution, but this is essentially repeating Group Block
Repeating Group Block → Trying to avoid this
Multiple Job Spec → Trying to avoid this

It seems like it’s possible to loop within a nomad task, but not across tasks.

Any solution appreciated, even hacky ones. TIA!

job "app-write" {
  datacenters = ["dc1"]
  type = "service"
  node_pool = "default"

  # Write
  group "app-write" {
    count = 3
    
    # /tmp/instance01
    volume "app01" {
      type = "host"
      read_only = false
      source = "tmpapp01"
    }

    # /tmp/instance02
    volume "app02" {
      type = "host"
      read_only = false
      source = "tmpapp02"
    }

    # /tmp/instance03
    volume "app03" {
      type = "host"
      read_only = false
      source = "tmpapp03"
    }
    
    network {
      port "http" { }  // 3100
      port "grpc" { } // 9095
      port "gossip" { }  // 7946
      # port "lb" { static = 8080 }
    }

    service {
      name = "app-write"
      address_mode = "host"
      port = "http"
      tags = ["http"]
      provider = "nomad"
    }

    service {
      name = "app-write"
      address_mode = "host"
      port = "grpc"
      tags = ["grpc"]
      provider = "nomad"
    }

    service {
      name = "app-write"
      address_mode = "host"
      port = "gossip"
      tags = ["gossip"]
      provider = "nomad"
    }

    task "app-write" {
      driver = "exec" # or "raw_exec"
      
      volume_mount {
        volume = "app01"
        destination = "/tmp/app01"
        read_only = false
      }

      volume_mount {
        volume = "app02"
        destination = "/tmp/app02"
        read_only = false
      }

      volume_mount {
        volume = "app03"
        destination = "/tmp/app03"
        read_only = false
      }

      config {
        command = "/usr/bin/app"
        args = [
          "-config.file=local/app/config.yaml",
          "-working-directory=/tmp/app01" # <-- Need this to change for each instance
          ]
      }

      resources {
        cpu = 100
        memory = 128
      }

      # Can change this for each instance too
      template {
        source = "/etc/app/config.yaml.tpl"
        destination = "local/app/config.yaml"
        change_mode = "restart" // restart
      }
    }
  }
}

Topic		Replies	Views
Nomad Group and Task question Nomad	6	1377	April 2, 2021
Nomad, run different instances of the same task in sequence Nomad	1	763	January 13, 2021
Task within a group not on the same Nomad client Nomad	3	851	July 8, 2020
How to schedule X tasks per node Nomad	4	329	July 31, 2022
Batch job spread with single batch group and multiple Nomad clients Nomad	1	505	August 16, 2022