"CLI Quick Start" guide fails on Hetzner Cloud VPS server when binding to 0.0.0.0

Using a VPS from hetzner.cloud (or a baremetal from Hetzner), running into network issues trying to follow the quick start guide.

VPS runs Ubuntu 20.04

Installation via APT:

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install nomad
root@nomad:~# nomad version
Nomad v1.0.4 (9294f35f9aa8dbb4acb6e85fa88e3e2534a3e41a)

Binding to 0.0.0.0 fails:

root@nomad:~# sudo nomad agent -dev -bind 0.0.0.0 -log-level INFO
==> No configuration files loaded
==> Starting Nomad agent...
==> Error starting agent: server config setup failed: Failed to resolve Serf advertise address ":4648": lookup <nil>: no such host
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-04-10T13:26:21.565+0200 [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
root@nomad:~#

Instead, I have to bind to the public ip (discovered through ifconfig), like this:

root@nomad:~# sudo nomad agent -dev -bind <ip from ifconfig> -log-level INFO                                                                                                              [0/142]
==> No configuration files loaded
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: <ip from ifconfig>:4646; RPC: <ip from ifconfig>:4647; Serf: <ip from ifconfig>:4648
            Bind Addrs: HTTP: <ip from ifconfig>:4646; RPC: <ip from ifconfig>:4647; Serf: <ip from ifconfig>:4648
                Client: true
             Log Level: INFO
                Region: global (DC: dc1)
                Server: true
               Version: 1.0.4

==> Nomad agent started! Log data will stream in below:

    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-04-10T13:27:44.287+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-04-10T13:27:44.290+0200 [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:<ip from ifconfig>:4647 Address:<ip from ifconfig>:4647}]"
    2021-04-10T13:27:44.291+0200 [WARN]  nomad: memberlist: Binding to public address without encryption!
    2021-04-10T13:27:44.291+0200 [INFO]  nomad: serf: EventMemberJoin: nomad.global <ip from ifconfig>
    2021-04-10T13:27:44.291+0200 [INFO]  nomad: starting scheduling worker(s): num_workers=1 schedulers=[service, batch, system, _core]
    2021-04-10T13:27:44.291+0200 [INFO]  client: using state directory: state_dir=/tmp/NomadClient543653941
    2021-04-10T13:27:44.291+0200 [INFO]  client: using alloc directory: alloc_dir=/tmp/NomadClient290117136
    2021-04-10T13:27:44.301+0200 [INFO]  nomad.raft: entering follower state: follower="Node at <ip from ifconfig>:4647 [Follower]" leader=
    2021-04-10T13:27:44.302+0200 [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
    2021-04-10T13:27:44.302+0200 [INFO]  nomad: adding server: server="nomad.global (Addr: <ip from ifconfig>:4647) (DC: dc1)"
    2021-04-10T13:27:44.305+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    2021-04-10T13:27:44.307+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    2021-04-10T13:27:44.308+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0
    2021-04-10T13:27:44.321+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2021-04-10T13:27:44.321+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2021-04-10T13:27:44.321+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=device
    2021-04-10T13:27:44.329+0200 [INFO]  client: started client: node_id=cf06c582-0114-4f19-3e62-d81cbfdb1c3a
    2021-04-10T13:27:45.357+0200 [WARN]  nomad.raft: heartbeat timeout reached, starting election: last-leader=
    2021-04-10T13:27:45.361+0200 [INFO]  nomad.raft: entering candidate state: node="Node at <ip from ifconfig>:4647 [Candidate]" term=2
    2021-04-10T13:27:45.362+0200 [INFO]  nomad.raft: election won: tally=1
    2021-04-10T13:27:45.362+0200 [INFO]  nomad.raft: entering leader state: leader="Node at <ip from ifconfig>:4647 [Leader]"
    2021-04-10T13:27:45.364+0200 [INFO]  nomad: cluster leadership acquired
    2021-04-10T13:27:45.367+0200 [INFO]  nomad.core: established cluster id: cluster_id=7e265453-b60d-2bca-378c-575b96aef570 create_time=1618054065367286004
    2021-04-10T13:27:45.454+0200 [INFO]  client: node registration complete
    2021-04-10T13:27:46.457+0200 [INFO]  client: node registration complete

While this starts Nomad correctly, subsequent regular CLI commands fail, e.g.

root@nomad:~# nomad server members
Error querying servers: Get "http://127.0.0.1:4646/v1/agent/members": dial tcp 127.0.0.1:4646: connect: connection refused

Is this expected?

Hi @tobiasmuehl :wave:

This is a bit unusualā€¦could you try specifying all addresses in a config file and see if it solves the issue. Also increase the log verbosity to see if thereā€™s any extra clues:

addresses {
  http = "0.0.0.0"
  rpc  = "0.0.0.0"
  serf = "0.0.0.0"
}
$ sudo nomad agent -config ./config.hcl -log-level TRACE

For the second issue, I think you will have to set the NOMAD_ADDR environment variable to point to the right IP:

$ export NOMAD_ADDR=http://<ip from ifconfig>:4646
$ nomad server members

Give it a try and let me know how it goes :slightly_smiling_face:.

Added the config file, doesnā€™t seem to help

root@nomad:~# cat config.hcl
addresses {
  http = "0.0.0.0"
  rpc  = "0.0.0.0"
  serf = "0.0.0.0"
}
root@nomad:~# sudo nomad agent -config ./config.hcl -dev -log-level TRACE
==> Loaded configuration from config.hcl
==> Starting Nomad agent...
==> Error starting agent: server config setup failed: Failed to resolve Serf advertise address ":4648": lookup <nil>: no such host
    2021-04-15T10:57:03.690+0200 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
    2021-04-15T10:57:03.690+0200 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-04-15T10:57:03.690+0200 [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0

Hi @tobiasmuehl, sorry for the delay, but I was able to get a VM in Hetzner Cloud to investigate this further.

The problem seems to have two parts:

  1. Hetzner Cloud doesnā€™t seem to provide a private IP by default.
  2. Nomad will try to default to the first private IP it finds if no advertise address is provided.

You can fix either of these to get this working:

  1. Make sure you provision a private network for your VM. This can be done in the Hetzner Cloud dashboard. Nomad will pick this IP for advertisement.

  2. Manually specify your VM IP as the advertise address. My previous answer was wrong. What you need to specify is the advertise block, not addresses. And the value for each should be the IP of the machine.

    So first grab the IP with ifconfig and then set your config.hcl to:

advertise {
  http = "<IP>"
  rpc  = "<IP>"
  serf = "<IP>"
}

Then you will be able to start the agent binding to 0.0.0.0:

$ nomad agent -dev -config ./config.hcl -bind 0.0.0.0

I hope this helps, and sorry for the wrong information I sent before.

4 Likes

Excuse me, could you teach me how to understand the difference between addresses and advertiseļ¼Ÿ
Now I need to create multiple clusters in different regions and federate them. The same region uses intranet traffic, and different regions use public traffic. How should I configure themļ¼ŸThank you for your timeļ¼
ļ¼ˆPlease forgive my poor grammar :sweat_smile:

Hi @SpringerX :wave:

Sure, these configuration can be quite confusing as the difference is a bit subtle.

Looking at the docs, we have these descriptions:

  • addresses (Addresses: see below) - Specifies the bind address for individual network services.

What this means is that this configuration will define which address each of Nomadā€™s network services (the HTTP API/UI and the internal RPC and Serf endpoints) will bind to, meaning, which IP should they listen on.

By default, these values are set to 0.0.0.0, which means that requests from any interface will be able to communicate with Nomad. If you were to change these values to 127.0.0.1, for example, only requests coming from the host itself would be able to reach Nomad. Or, if you use a specific IP address of the machine, like 192.168.0.10, you will only be able to access Nomad from that specific IP.

You will want to set this configuration to a value that other nodes in your cluster can reach, but not necessarily make it public if you donā€™t have other security mechanism like a firewall or security groups in place.

  • advertise (Advertise: see below) - Specifies the advertise address for individual network services. This can be used to advertise a different address to the peers of a server or a client node to support more complex network configurations such as NAT.

This is sort of related to the previous config, in the sense that it needs to be set to a value that is compatible with addresses. This configuration defines what the local Nomad agent will tell its peers how to reach it.

For example, if you set the addresses to 127.0.0.1 and advertise to 192.168.0.10, this setup is incompatible because the agent is saying to others

You can reach me at 192.168.0.10

But in reality itā€™s only listening in the localhost/loopback interface, so no one from the outside will be actually be able to communicate with it. So you need to make sure that the address being advertised can actually be used from the outside.

As the docs mention, itā€™s often not needed to modify this value unless you have a complex network topology, where the external IP of a machine is different from its internal IP (like what happens when you have a NAT deployment).

I hope this explains the difference better, if not, feel free to ask again.

Itā€™s hard to tell what their value should be as they depend a lot on the specifics of your network infrastructure.

You will probably want to treat each cluster in isolation, all going through intranet traffic, and your servers configured in a way that they can communicate over the public interface, this way you will be able to federate them.

The traffic between the servers that is going through the public network needs to be secure, so gossip and TLS encryption are a must.

(Your grammar is perfectly understandable :slightly_smiling_face:)

1 Like