Hi all,
I want to run an Alertmanager cluster on nomad. For my single instance setup I have this job definition: (Levant template)
job "alertmanager" {
type = "service"
datacenters = ["dc1"]
constraint {
attribute = "${node.class}"
value = "app"
}
spread {
attribute = "${node.unique.id}"
weight = 100
}
group "alertmanager" {
network {
port "http" {
static = 80
to = 9093
host_network = "internal"
}
}
task "alertmanager" {
driver = "docker"
config {
image = "prom/alertmanager:v0.23.0"
args = [
"--config.file=/local/config.yml"
]
force_pull = true
ports = [
"http",
]
}
vault {
policies = ["alertmanager"]
}
resources {
memory = 1024
cpu = 1000
}
template {
data = <<EOF
[[ fileContents "alertmanager/config.yml" ]]
EOF
destination = "local/config.yml"
change_mode = "signal"
change_signal = "SIGHUP"
}
}
count = 1
service {
port = "http"
name = "alertmanager"
check {
type = "http"
protocol = "http"
port = "http"
path = "/-/healthy"
interval = "10s"
timeout = "3s"
}
}
}
}
In order to run the Alertmanager as three-instance cluster my plan is to increase the task/group count and modify the docker container args as explained here: GitHub - prometheus/alertmanager: Prometheus Alertmanager
The problem is, I can’t get the IP/Port of the other instances to add them as argument. I assume It’s also very hard to support that since allocations are “separate” and IP/ports change etc.
To solve that my backup plan was to use the consul dns address alertmanager.service.consul
for --cluster.peer
parameter. The problem with this solution is that a starting task is unable to resolve alertmanager.service.consul
and fails. So the others too…
Maybe someone has an idea how to solve this chicken and egg issue
Thanks!