How can I get other host networks recognised on existing nodes?

Adding this for future me.

In the end the problem wasn’t really Nomad.

With Hetzner cloud, when you set up a private network, you need to pay attention to the name of the interface when you declare a host network, along these lines:

client {
  host_network "hetzner" {

  interface = "my-interface-name"
  cidr      = "10.0.0.0/24"
  reserved_ports = "22,80"
  }
}

This is because it isn’t consistent across all machine types.

For the older VMS, it was ens10, but for the new machines it’s a different network interface name `

Network CX, CCX*1 CPX, CAX, CCX2, CCX3
First attached network ens10 enp7s0
Additional interfaces (second) ens11 enp8s0
Additional interfaces (third) ens12 enp9s0

This is documented on the corresponding page in Hetzner’s docs.

Looking back what would have saved me lots of heartache

I’ve learned that Nomad lets you set up a host network with an named interface that doesn’t exist on a host machine, and it doesn’t give you any indication that you’re connecting to a non-existent network interface.

So this:

client {
  host_network "hetzner" {

  interface = "ens12345"
  cidr      = "10.0.0.0/24"
  reserved_ports = "22,80"
  }
}

Will show up in the node info output when you call:

nomad node status -verbose <node_id>

And in the info you’ll see output like this as text:

Host Networks
Name     CIDR         Interface  ReservedPorts
hetzner  10.0.0.0/24  ens12345     22,80

Which can give the impression that everything is working, except when you try to place a job, you’ll see output like this:

Scheduler dry-run:
- WARNING: Failed to place all allocations.
  Task Group "<MY_JOB_NAME>" (failed to place 1 allocation):
    * Constraint "missing host network \"hetzner\" for port \"http\"": 1 nodes excluded by filter

The host network isn’t missing - it’s clearly there! I think it’s s more a case that the IP address allocated to the node can’t be determined, and as a result, the node fails the text against the constraint.

Having some logs when a nomad client starts up, warning that a host_network interface could not be determined would really help, because there appears to be a bunch of network output logged anyway on startup. Here’s some of the start-up logs:

Sep 15 14:23:32 app3.my-org.org nomad[80209]:     2023-09-15T14:23:32.853Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0
Sep 15 14:23:32 app3.my-org.org nomad[80209]:     2023-09-15T14:23:32.853Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
Sep 15 14:23:32 app3.my-org.org nomad[80209]:     2023-09-15T14:23:32.856Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0
Sep 15 14:23:32 app3.my-org.org nomad[80209]:     2023-09-15T14:23:32.861Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=enp7s0
Sep 15 14:23:32 app3.my-org.org nomad[80209]:     2023-09-15T14:23:32.864Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0

Seeing the interface name is what led me to the fix in the end.

I hope this helps someone else in future (or even future me, next time I’m troubleshooting network issues…)

1 Like