Best way to get peer node information in a job specification

Hello everyone,

I am setting up a Kafka cluster using Nomad and I need to specify the peer node information for the Kafka configuration, which is passed in as a docker environment variable. I would like to retrieve this information dynamically within the job specification, rather than hardcoding it, if that’s possible.

Can anyone recommend the best way to get the peer node information (hostnames or IP addresses), i.e. of other nodes forming the cluster, when writing a job specification in Nomad? I’m using Nomad together with Consul for service discovery if that’s relevant.

Thank you in advance for your help!

Hi there!

This seems to be a common challenge – maybe even common enough to be called a “pattern”!

It was recently discussed in

and references therein.

Perhaps the moderators could merge these threads into one and build a megathread – it might also be a good idea to link it from the tutorial page (see links in the posted thread) which deals with it?

1 Like

Seems I found a solution (after many intense hours…)

Use a template block with consul service lookup, with env set to true (check the docs on that) .

E.g.

      template {
        env = true
        change_mode = "noop"
        destination = "docker_env"
        data = <<EOF
KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=
  {{- range $i, $e := service "kafka-broker" -}}
    {{- if eq $i 0 -}}
      {{$i}}@{{ .Address }}:{{ .Port }}
    {{- else -}}
      ,{{$i}}@{{ .Address }}:{{ .Port }}
    {{- end -}}
  {{- end}}
        EOF
      }

Posting my entire job specification for reference (this creates a Kafka cluster with three nodes):

job "kafka" {
  datacenters = ["dc1"]

  group "kafka" {
    count = 3

    network {
      port "kafka" {
        to = 9092
      }
      port "kafka-broker" {
        to = 9093
      }
    }

    ephemeral_disk {
      size    = 5000
    }

    service {
      name = "kafka"
      provider = "consul"
      address = "${attr.unique.network.ip-address}"
      port = "kafka"
    }

    service {
      name = "kafka-broker"
      provider = "consul"
      address = "${attr.unique.network.ip-address}"
      port = "kafka-broker"
    }

    task "kafka" {
      driver = "docker"

      config {
        image = "bitnami/kafka:latest"
        ports = ["kafka", "kafka-broker"]
      }

      env {
        KAFKA_CFG_ADVERTISED_LISTENERS = "PLAINTEXT://${attr.unique.network.ip-address}:${NOMAD_HOST_PORT_kafka}"
        KAFKA_CFG_CONTROLLER_QUORUM_VOTERS = "${KAFKA_CFG_CONTROLLER_QUORUM_VOTERS}"
        KAFKA_BROKER_ID = "${KAFKA_BROKER_ID}"
        BITNAMI_DEBUG = "yes"
        KAFKA_ENABLE_KRAFT = "yes"
        KAFKA_KRAFT_CLUSTER_ID = "NjZiYzNkYzViZjI2NDM1NT"
        KAFKA_CFG_PROCESS_ROLES = "broker,controller"
        KAFKA_CFG_CONTROLLER_LISTENER_NAMES = "CONTROLLER"
        KAFKA_CFG_LISTENERS = "PLAINTEXT://:9092,CONTROLLER://:9093"
        KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP = "CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT"
        ALLOW_PLAINTEXT_LISTENER = "yes"
      }

      template {
        env = true
        destination = "docker_env"
        data = <<EOF
KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=
  {{- range $i, $e := service "kafka-broker" -}}
    {{- if eq $i 0 -}}
      {{$i}}@{{ .Address }}:{{ .Port }}
    {{- else -}}
      ,{{$i}}@{{ .Address }}:{{ .Port }}
    {{- end -}}
  {{- end}}

KAFKA_BROKER_ID={{- range $i, $e := service "kafka-broker" -}}
  {{- if and (eq .Address (env "attr.unique.network.ip-address")) (eq .Port (env "NOMAD_HOST_PORT_kafka_broker" | parseInt)) -}}
    {{$i}}
  {{- end -}}
{{- end}}
        EOF
      }
      resources {
        memory = 1024
        memory_max = 4096
      }
    }
  }
}

This is my first working version - improvement suggestions welcome.

2 Likes

I would love to see this covered in a learn guide (or whatever we’re calling them these days). Wanting to run services as an HA cluster in Nomad is, I would think, super common or something that we would all want to be super common. The HA Proxy guide uses a somewhat bespoke pattern that leans on passing a Consul DNS SRV host to the HA Proxy’s internal configuration file syntax and not some of the more broadly useful details around what Nomad can output about service internals. It’s super cool, and good to know that’s a tool in the belt, but probably not as applicable to just iterating over the allocated service ips as this example does.

What would be folks opinion running kafka (zookeeper, pulsar, and such) on EC2 using raw_exec ?

For peer discovery, how about seeding nodes with Consul node_meta and using that in the template?

This “pattern” allows me to treat individual EC2 instances as “unique” entities backend by their own storage.

Replying to my own solution: This leads to absolute chaos when trying to do a rolling restart, since the restart triggers a change in the service registration, which means the template changes, which causes an infinite feedback-loop of job restarts.

1 Like