Hi there.
I’m trying to understand why some clients in a cluster, which have host_networks, are not having their host networks recognised, when I try to schedule jobs on on them.
My situation
I have a small cluster of machines, which are all connected on a private network, as well as having public IP addresses.
My network stanza
Every machine has this network stanza, in /etc/nomad.d/config/networks.hcl
.
client {
host_network "hetzner-1" {
interface = "ens10"
cidr = "10.0.0.0/24"
}
}
This is merged into a config file at /etc/nomad.d/nomad.hcl
. It looks like this:
# Full configuration options can be found at https://www.nomadproject.io/docs/configuration
data_dir = "/opt/nomad/data"
bind_addr = "10.0.0.4"
advertise {
http = "10.0.0.4"
rpc = "10.0.0.4"
serf = "10.0.0.4"
}
client {
enabled = true
# https://www.nomadproject.io/docs/configuration/server_join
server_join {
retry_join = ["10.0.0.2"]
}
}
telemetry {
publish_allocation_metrics = true
publish_node_metrics = true
prometheus_metrics = true
}
log_level = "INFO"
I would expect the merging to mean that the host network is picked up, but only one machine seems to have the network recognised.
Here’s the have a system job which I want to run on every node, but where most of my nodes are being filtered because they’re assumed not to have the host network defined in network stanza above.
My job file looks like this:
job "node_exporter" {
datacenters = ["dc1"]
type = "system"
group "node_exporter" {
count = 1
network {
mode = "host"
port "node_exporter" {
static = 9100
host_network = "hetzner-1"
}
}
task "node_exporter" {
driver = "docker"
config {
image = "prom/node-exporter:v1.4.0"
ports = ["node_exporter"]
args = [
"--web.listen-address", ":${NOMAD_PORT_node_exporter}",
]
}
resources {
cpu = 100 # 100 MHz
memory = 64 # 64MB
}
}
}
}
Becuase every client node has the host_network stanza, I would expect this system job to run a node_exporter on every client.
However, I get this response:
nomad plan ./nomad/02-node-exporter.hcl
Job: "node_exporter"
Task Group: "node_exporter" (1 in-place update)
Task: "node_exporter"
Scheduler dry-run:
- WARNING: Failed to place allocations on all nodes.
Task Group "node_exporter" (failed to place 1 allocation):
* Class "worker": 1 nodes excluded by filter
* Constraint "missing host network \"hetzner-1\" for port \"node_exporter\"": 3 nodes excluded by filter
Job Modify Index: 72860
Every node apart from the server node is being filtered out, because the host network is assumed to be missing.
How do I get my host network recognised, so nodes are not filtered out unnecessarily?