Networking assistance

Hello:

I am attempting to replicate my standalone Podman environment, where I’ve configured networking using macvlan networks for a handful of applications to get containers on the physical network. The alloc/container is deployed, but I cannot reach its management page.
I am unsure if I am heading in the correct direction and am looking for assistance.

Environment:
Nomad: v1.8.1
Podman: v5.1.1
Fedora 40

The client nodes have sub-interfaces for tagging VLAN traffic:

$ ip -br addr | grep vlan
vlan.108@enp1s0  UP
vlan.100@enp1s0  UP             192.168.100.13/25

CNI plugins were installed into /opt/cni/bin.
I have defined a cni configuration file for VLAN 100 as:

$ cat /opt/cni/config/vlan.100.conflist
{
  "cniVersion": "1.0.0",
  "name": "vlan.100",
  "plugins": [
    {
      "type": "macvlan",
      "master": "vlan.100",
      "ipam": {
        "type": "static",
        "addresses": [
          {
            "address": "192.168.100.15/25",
            "gateway": "192.168.100.1"
          }
        ],
        "dns": {
          "nameservers": ["192.168.108.11", "192.168.108.10"],
          "domain": "example.com",
          "search": ["example.com"]
        }
      }
    },
    {
      "type": "tuning",
      "mac": "7a:44:45:00:00:00"
    }
  ]
}

This is the job I’m currently testing with:

job "netbootxyz" {
  datacenters = ["lab"]
  type        = "service"

  group "netbootxyz" {
    network {
      mode = "cni/vlan.100"
      port "ui" {
        static = 3000
      }
      port "tftp" {
        static = 69
      }
    }

    volume "truenas-nfs" {
      type            = "csi"
      source          = "truenas-nfs"
      read_only       = false
      attachment_mode = "file-system"
      access_mode     = "multi-node-multi-writer"
    }

    task "netbootxyz" {
      driver = "podman"

      config {
        image      = "netbootxyz/netbootxyz:0.7.1-nbxyz3"
        ports      = ["ui", "tftp"]
        privileged = true
      }

      volume_mount {
        volume      = "truenas-nfs"
        destination = "/assets"
        read_only   = false
      }

      volume_mount {
        volume      = "truenas-nfs"
        destination = "/config"
        read_only   = false
      }

      resources {
        cpu        = 1000
        memory     = 1024
      }
    }
  }
}

The allocation is bound to the node’s IP address, unfortunately :neutral_face:

$ nomad alloc status 963e5b24
ID                  = 963e5b24-74f3-91e6-4794-9b4c0ccdcb86
Eval ID             = 1be7019f
Name                = netbootxyz.netbootxyz[0]
Node ID             = b2f02eaa
Node Name           = prod-core-services04
Job ID              = netbootxyz
Job Version         = 1
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 13m23s ago
Modified            = 13m2s ago
Deployment ID       = 1b2f4cc7
Deployment Health   = healthy

Allocation Addresses (mode = "cni/vlan.100"):
Label  Dynamic  Address
*ui    yes      192.168.100.13:3000
*tftp  yes      192.168.100.13:69

Task "netbootxyz" is "running"
Task Resources:
CPU         Memory          Disk     Addresses
0/1000 MHz  35 MiB/1.0 GiB  300 MiB

CSI Volumes:
ID           Read Only
truenas-nfs  false
truenas-nfs  false

Task Events:
Started At     = 2024-07-04T02:25:23Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2024-07-03T22:25:23-04:00  Started     Task started by client
2024-07-03T22:25:23-04:00  Task Setup  Building Task Directory
2024-07-03T22:25:13-04:00  Received    Task received by client

The container does get the IP and MAC addresses:

$ sudo podman container inspect c813719a2115 | jq '.[].NetworkSettings'
[sudo] password for originaltrini0:
{
  "EndpointID": "",
  "Gateway": "<nil>",
  "IPAddress": "192.168.100.15",
  "IPPrefixLen": 25,
  "IPv6Gateway": "",
  "GlobalIPv6Address": "",
  "GlobalIPv6PrefixLen": 0,
  "MacAddress": "7a:44:45:00:00:00",
  "Bridge": "",
  "SandboxID": "",
  "HairpinMode": false,
  "LinkLocalIPv6Address": "",
  "LinkLocalIPv6PrefixLen": 0,
  "Ports": {},
  "SandboxKey": ""
}

I expect that the alloc/container consumes the cni-specific IP/MAC address so that it lives on the physical network.
Where am I going wrong with the “Nomad” way, and if there is a better way to wire this up, I am open to suggestions.

Thank you!

I believe I have figured out my issue, which was twofold.

  1. My CNI definition was incomplete
  2. I am assuming that Nomad will not correctly report the intended IP address of the allocation. This might be by “design” if I interpreted this correctly…

For the CNI definition, here is what I’ve settled on:

{
  "cniVersion": "1.0.0",
  "name": "netbootxyz",
  "plugins": [
    {
      "type": "macvlan",
      "master": "vlan.100",
      "mode": "bridge",
      "ipam": {
        "type": "static",
        "addresses": [
          {
            "address": "192.168.100.15/25",
            "gateway": "192.168.100.1"
          }
        ],
        "routes": [
          { "dst": "0.0.0.0/0" }
        ],
        "dns": {
          "nameservers": ["192.168.108.11", "192.168.108.10"],
          "domain": "example.com",
          "search": ["example.com"]
        }
      }
    },
    {
      "type": "tuning",
      "mac": "7a:44:45:00:00:00"
    }
  ]
}

The job:

job "netbootxyz" {
  datacenters = ["lab"]
  type        = "service"

  group "netbootxyz" {
    network {
      mode = "cni/netbootxyz"
      port "ui" {
        static = 3000
      }
      port "tftp" {
        static = 69
      }
    }

    volume "truenas-nfs" {
      type            = "csi"
      source          = "truenas-nfs"
      read_only       = false
      attachment_mode = "file-system"
      access_mode     = "multi-node-multi-writer"
    }

    task "netbootxyz" {
      driver = "podman"

      config {
        image      = "netbootxyz/netbootxyz:0.7.1-nbxyz3"
        ports      = ["ui", "tftp"]
        privileged = true
      }

      volume_mount {
        volume      = "truenas-nfs"
        destination = "/assets"
        read_only   = false
      }

      volume_mount {
        volume      = "truenas-nfs"
        destination = "/config"
        read_only   = false
      }

      resources {
        cpu        = 1000
        memory     = 1024
      }
    }
  }
}

From here, the alloc/container participates in the physical network with all its intended requirements. But the UI reports that the alloc/container is bound to the node’s IP:

nomad alloc status -json 30c76495 | jq '.Resources.Networks'
[
  {
    "CIDR": "",
    "DNS": null,
    "Device": "",
    "DynamicPorts": null,
    "Hostname": "",
    "IP": "192.168.100.13",
    "MBits": 0,
    "Mode": "cni/netbootxyz",
    "ReservedPorts": [
      {
        "HostNetwork": "default",
        "Label": "ui",
        "To": 0,
        "Value": 3000
      },
      {
        "HostNetwork": "default",
        "Label": "tftp",
        "To": 0,
        "Value": 69
      }
    ]
  }
]

Since I am very much a Nomad newb, if my interpretation is incorrect, please do correct me.

Thanks