"CLI Quick Start" guide fails on Hetzner Cloud VPS server when binding to 0.0.0.0

Using a VPS from hetzner.cloud (or a baremetal from Hetzner), running into network issues trying to follow the quick start guide.

VPS runs Ubuntu 20.04

Installation via APT:

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install nomad
root@nomad:~# nomad version
Nomad v1.0.4 (9294f35f9aa8dbb4acb6e85fa88e3e2534a3e41a)

Binding to 0.0.0.0 fails:

root@nomad:~# sudo nomad agent -dev -bind 0.0.0.0 -log-level INFO
==> No configuration files loaded
==> Starting Nomad agent...
==> Error starting agent: server config setup failed: Failed to resolve Serf advertise address ":4648": lookup <nil>: no such host
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
root@nomad:~#

Instead, I have to bind to the public ip (discovered through ifconfig), like this:

root@nomad:~# sudo nomad agent -dev -bind <ip from ifconfig> -log-level INFO                                                                                                              [0/142]
==> No configuration files loaded
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: <ip from ifconfig>:4646; RPC: <ip from ifconfig>:4647; Serf: <ip from ifconfig>:4648
            Bind Addrs: HTTP: <ip from ifconfig>:4646; RPC: <ip from ifconfig>:4647; Serf: <ip from ifconfig>:4648
                Client: true
             Log Level: INFO
                Region: global (DC: dc1)
                Server: true
               Version: 1.0.4

==> Nomad agent started! Log data will stream in below:

    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-04-10T13:27:44.290+0200 [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:<ip from ifconfig>:4647 Address:<ip from ifconfig>:4647}]"
    2021-04-10T13:27:44.291+0200 [WARN]  nomad: memberlist: Binding to public address without encryption!
    2021-04-10T13:27:44.291+0200 [INFO]  nomad: serf: EventMemberJoin: nomad.global <ip from ifconfig>
    2021-04-10T13:27:44.291+0200 [INFO]  nomad: starting scheduling worker(s): num_workers=1 schedulers=[service, batch, system, _core]
    2021-04-10T13:27:44.291+0200 [INFO]  client: using state directory: state_dir=/tmp/NomadClient543653941
    2021-04-10T13:27:44.291+0200 [INFO]  client: using alloc directory: alloc_dir=/tmp/NomadClient290117136
    2021-04-10T13:27:44.301+0200 [INFO]  nomad.raft: entering follower state: follower="Node at <ip from ifconfig>:4647 [Follower]" leader=
    2021-04-10T13:27:44.302+0200 [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
    2021-04-10T13:27:44.302+0200 [INFO]  nomad: adding server: server="nomad.global (Addr: <ip from ifconfig>:4647) (DC: dc1)"
    2021-04-10T13:27:44.305+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    2021-04-10T13:27:44.307+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    2021-04-10T13:27:44.308+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0
    2021-04-10T13:27:44.321+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2021-04-10T13:27:44.321+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2021-04-10T13:27:44.321+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=device
    2021-04-10T13:27:44.329+0200 [INFO]  client: started client: node_id=cf06c582-0114-4f19-3e62-d81cbfdb1c3a
    2021-04-10T13:27:45.357+0200 [WARN]  nomad.raft: heartbeat timeout reached, starting election: last-leader=
    2021-04-10T13:27:45.361+0200 [INFO]  nomad.raft: entering candidate state: node="Node at <ip from ifconfig>:4647 [Candidate]" term=2
    2021-04-10T13:27:45.362+0200 [INFO]  nomad.raft: election won: tally=1
    2021-04-10T13:27:45.362+0200 [INFO]  nomad.raft: entering leader state: leader="Node at <ip from ifconfig>:4647 [Leader]"
    2021-04-10T13:27:45.364+0200 [INFO]  nomad: cluster leadership acquired
    2021-04-10T13:27:45.367+0200 [INFO]  nomad.core: established cluster id: cluster_id=7e265453-b60d-2bca-378c-575b96aef570 create_time=1618054065367286004
    2021-04-10T13:27:45.454+0200 [INFO]  client: node registration complete
    2021-04-10T13:27:46.457+0200 [INFO]  client: node registration complete

While this starts Nomad correctly, subsequent regular CLI commands fail, e.g.

root@nomad:~# nomad server members
Error querying servers: Get "http://127.0.0.1:4646/v1/agent/members": dial tcp 127.0.0.1:4646: connect: connection refused

Is this expected?

Hi @tobiasmuehl :wave:

This is a bit unusual…could you try specifying all addresses in a config file and see if it solves the issue. Also increase the log verbosity to see if there’s any extra clues:

addresses {
  http = "0.0.0.0"
  rpc  = "0.0.0.0"
  serf = "0.0.0.0"
}
$ sudo nomad agent -config ./config.hcl -log-level TRACE

For the second issue, I think you will have to set the NOMAD_ADDR environment variable to point to the right IP:

$ export NOMAD_ADDR=http://<ip from ifconfig>:4646
$ nomad server members

Give it a try and let me know how it goes :slightly_smiling_face:.

Added the config file, doesn’t seem to help

root@nomad:~# cat config.hcl
addresses {
  http = "0.0.0.0"
  rpc  = "0.0.0.0"
  serf = "0.0.0.0"
}
root@nomad:~# sudo nomad agent -config ./config.hcl -dev -log-level TRACE
==> Loaded configuration from config.hcl
==> Starting Nomad agent...
==> Error starting agent: server config setup failed: Failed to resolve Serf advertise address ":4648": lookup <nil>: no such host
    2021-04-15T10:57:03.690+0200 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
    2021-04-15T10:57:03.690+0200 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0

Hi @tobiasmuehl, sorry for the delay, but I was able to get a VM in Hetzner Cloud to investigate this further.

The problem seems to have two parts:

  1. Hetzner Cloud doesn’t seem to provide a private IP by default.
  2. Nomad will try to default to the first private IP it finds if no advertise address is provided.

You can fix either of these to get this working:

  1. Make sure you provision a private network for your VM. This can be done in the Hetzner Cloud dashboard. Nomad will pick this IP for advertisement.

  2. Manually specify your VM IP as the advertise address. My previous answer was wrong. What you need to specify is the advertise block, not addresses. And the value for each should be the IP of the machine.

    So first grab the IP with ifconfig and then set your config.hcl to:

advertise {
  http = "<IP>"
  rpc  = "<IP>"
  serf = "<IP>"
}

Then you will be able to start the agent binding to 0.0.0.0:

$ nomad agent -dev -config ./config.hcl -bind 0.0.0.0

I hope this helps, and sorry for the wrong information I sent before.

1 Like