Nomad bridge network not enabled?

tommy · January 1, 2022, 6:51pm

I’m trying to set up Nomad on a server, but when I check ip a there is no nomad bridge interface which I would have expected by default. Is there some specific setting that needs to be configured in order to get the bridge network to be set up when Nomad starts? I’m trying to get this working on Rocky Linux, and my Nomad version is 1.2.3.

Here’s my config:

data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"
datacenter = "local"

acl {
  enabled = true
  token_ttl = "30s"
  policy_ttl = "60s"
}

server {
  # license_path is required as of Nomad v1.1.1+
  # license_path = "/etc/nomad.d/nomad.hcl"
  enabled = true
  bootstrap_expect = 1
}

client {
  enabled = true
  servers = ["127.0.0.1"]
}

plugin "docker" {
  config {
    allow_caps = ["audit_write", "chown", "dac_override", "fowner", "fsetid", "kill", "mknod",
 "net_bind_service", "setfcap", "setgid", "setpcap", "setuid", "sys_chroot", "sys_admin", "sys_rawio"]

    gc {
      image = true
      image_delay = "3m"
      container   = true

      dangling_containers {
        enabled = true
        dry_run = false
        period         = "5m"
        creation_grace = "5m"
      }
    }

    volumes {
      enabled = true
      #selinuxlabel = "z"
    }
  }
}

consul {
  address = "127.0.0.1:8500"
  server_service_name = "nomad-server"
  client_service_name = "nomad-client"
  auto_advertise      = true
  server_auto_join    = true
  client_auto_join    = true
  token   = "redacted"
}

jrasell · January 3, 2022, 10:47am

Hi @tommy.

The Nomad client does not configure the bridge interface until the first allocation is placed on it which utilises network.mode = "bridge". This is due to the way in which the client network subsystem calls currently work.

Running the following example job on a fresh client will result in the configuration of the bridge interface.

job "example" {
  datacenters = ["local"]

  group "cache" {
    network {
      mode = "bridge"
      port "db" {
        to = 6379
      }
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        ports = ["db"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

You can then see the interface.

$ ip addr show nomad
26: nomad: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether c6:53:48:3a:e2:48 brd ff:ff:ff:ff:ff:ff
    inet 172.26.64.1/20 brd 172.26.79.255 scope global nomad
       valid_lft forever preferred_lft forever
    inet6 fe80::c453:48ff:fe3a:e248/64 scope link
       valid_lft forever preferred_lft forever

Thanks,
jrasell and the Nomad team

tommy · January 3, 2022, 11:46am

Thanks for the response @jrasell, and yes, I found that the bridge interface appears once a task is started using the bridge mode.

I however have a new issue related to this, which is that from inside the bridge network my containers are unable to reach the internet.

Here are the results if I run these commands in a container which is in the bridge network:

/ # dig +short google.com
142.250.179.174
/ # curl google.com
curl: (7) Failed to connect to google.com port 80 after 1048 ms: Host is unreachable

As you can see the DNS lookup works fine, but curling the address does not.
Both of these commands work fine on the host machine itself, so the problem seems to exist only inside the bridge network.

jrasell · January 3, 2022, 12:43pm

Hi @tommy.

I ran a test locally and was unable to reproduce this using a very minimal setup.

nomad alloc exec 0937745f curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

Are the host machines or outer infrastructure running any firewalls which may be interfering with the traffic? Are you able to provide the job specification, or a minimal similar version so I can test with better accuracy?

Thanks,
jrasell and the Nomad team

tommy · January 3, 2022, 1:29pm

@jrasell I’m running Nomad on Rocky Linux, on which I have currently turned off firewalld, so there shouldn’t be any interference from that. I also have a Unifi Dream Machine Pro that manages my network, but again, there aren’t any settings there that would seem to affect this. I used to run the same containers that I’m now migrating to Nomad with docker-compose on the same host, and they never had this kind of connection issues.

Here is an example job to bring up a container that I just exec into:

job "network_multitool" {
  datacenters = ["local"]
  type        = "batch"

  group "network_multitool" {
    network {
      mode = "bridge"
    }

    task "network_multitool" {
      driver = "docker"

      config {
        image      = "praqma/network-multitool:alpine-extra"
        entrypoint = ["sleep", "infinity"]
      }
    }
  }
}

Here’s my nomad config for reference:

data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"
datacenter = "local"

acl {
  enabled = true
  token_ttl = "30s"
  policy_ttl = "60s"
}

server {
  # license_path is required as of Nomad v1.1.1+
  # license_path = "/etc/nomad.d/nomad.hcl"
  enabled = true
  bootstrap_expect = 1
}

client {
  enabled = true
  servers = ["127.0.0.1"]
}

plugin "docker" {
  config {
    allow_caps = ["audit_write", "chown", "dac_override", "fowner", "fsetid", "kill", "mknod",
 "net_bind_service", "setfcap", "setgid", "setpcap", "setuid", "sys_chroot", "sys_admin", "sys_rawio"]

    gc {
      image = true
      image_delay = "3m"
      container   = true

      dangling_containers {
        enabled = true
        dry_run = false
        period         = "5m"
        creation_grace = "5m"
      }
    }

    volumes {
      enabled = true
      #selinuxlabel = "z"
    }
  }
}

consul {
  address = "127.0.0.1:8500"
  server_service_name = "nomad-server"
  client_service_name = "nomad-client"
  auto_advertise      = true
  server_auto_join    = true
  client_auto_join    = true
  token   = "redacted"
}

On the host the nomad interface is this:

55: nomad: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 72:29:24:47:18:61 brd ff:ff:ff:ff:ff:ff
    inet 172.26.64.1/20 brd 172.26.79.255 scope global nomad
       valid_lft forever preferred_lft forever
    inet6 fe80::b436:43ff:fe36:4534/64 scope link
       valid_lft forever preferred_lft forever

In the container ip a lists this:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0@if4828: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 8a:0e:c0:dd:33:ec brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.11/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever

Not sure if this is any indication of the cause, but in the network_multitool container it says that the ping and traceroute operations are not permitted, even if the user is root.

bash-5.1# ping google.com
ping: socket: Operation not permitted
bash-5.1# traceroute google.com
traceroute: socket(AF_INET,3,1): Operation not permitted
bash-5.1# whoami
root

Also, here is my docker daemon.json:

{
  "dns": ["172.26.64.1"],
  "dns-search": ["consul"],
  "insecure-registries": ["http://192.168.1.100:8082"],
  "registry-mirrors": ["http://192.168.1.100:8082"],
  "bridge": "none"
}

BirkhoffLee · March 7, 2022, 7:52am

I was hit by the exact same issue, where in the container it’s not possible to reach some hosts:

/usr/src/app $ curl -v google.com
*   Trying 142.251.12.138:80...
*   Trying 2404:6800:4003:c04::71:80...
* Immediate connect fail for 2404:6800:4003:c04::71: Address not available
*   Trying 2404:6800:4003:c04::65:80...
* Immediate connect fail for 2404:6800:4003:c04::65: Address not available
*   Trying 2404:6800:4003:c04::8a:80...
* Immediate connect fail for 2404:6800:4003:c04::8a: Address not available
*   Trying 2404:6800:4003:c04::64:80...
* Immediate connect fail for 2404:6800:4003:c04::64: Address not available
* connect to 142.251.12.138 port 80 failed: Host is unreachable
*   Trying 142.251.12.101:80...
* connect to 142.251.12.101 port 80 failed: Host is unreachable
*   Trying 142.251.12.100:80...
* connect to 142.251.12.100 port 80 failed: Host is unreachable
*   Trying 142.251.12.102:80...
* connect to 142.251.12.102 port 80 failed: Host is unreachable
*   Trying 142.251.12.139:80...
* connect to 142.251.12.139 port 80 failed: Host is unreachable
*   Trying 142.251.12.113:80...
* connect to 142.251.12.113 port 80 failed: Host is unreachable
* Failed to connect to google.com port 80 after 6139 ms: Host is unreachable
* Closing connection 0
curl: (7) Failed to connect to google.com port 80 after 6139 ms: Host is unreachable

I’m on a AWS machine running CentOS 8 Stream with firewalld enabled. Also Docker driver & bridged network.

Edit: This is now being addressed in No outbound network connectivity when group network is bridge · Issue #12199 · hashicorp/nomad · GitHub

Topic		Replies	Views
Failed to find plugin "bridge" in path Nomad	3	6552	March 26, 2024
Do I need a bridged network on the Nomad Host? Nomad	2	422	March 8, 2023
Nomad, WSL and bridge networking Nomad	3	2493	May 31, 2023
Unable to bind containers to 0.0.0.0 Nomad	2	1524	January 3, 2022
Nomad bridge network cni plugin fails with many allocations on host Nomad	5	725	January 19, 2024

Nomad bridge network not enabled?

Related topics