Hello all:
First-time poster, and I’m evaluating Nomad (v1.8.1).
I am using 5 VMware Fusion (v13.5.2) VMs on MacOS Sonoma to mimic a small (3) server and (2) client cluster.
For the container engine, I am using Podman (v5.1.1) on the client nodes, and everyone is running Ubuntu 24.04.
The servers and clients are all up and are in alive/ready statuses:
$ nomad server members
Name Address Port Status Leader Raft Version Build Datacenter Region
nomad-srv1.global 192.168.20.43 4648 alive false 3 1.8.1 lab global
nomad-srv2.global 192.168.20.59 4648 alive false 3 1.8.1 lab global
nomad-srv3.global 192.168.20.32 4648 alive true 3 1.8.1 lab global
$
$ nomad node status
ID Node Pool DC Name Class Drain Eligibility Status
bb0a9563 default lab nomad-client2 <none> false eligible ready
35712c79 default lab nomad-client1 <none> false eligible ready
Podman integration looks ok:
$ nomad node status -verbose 35712c79 | grep -i "podman"
podman true true ready 2024-06-29T17:07:15Z
driver.podman = 1
driver.podman.cgroupVersion = v2
driver.podman.rootless = false
driver.podman.version = 5.1.1
While following this Hashicorp Podman guide, when attempting to run a job, it hangs and is complaining about lack of resources:
$ nomad job run --verbose nginx.nomad
==> 2024-06-29T17:27:29Z: Monitoring evaluation "712a7306-cbba-d799-718d-1c5e128c967a"
2024-06-29T17:27:29Z: Evaluation triggered by job "nginx-podman-job"
2024-06-29T17:27:30Z: Evaluation within deployment: "3189840b-ec6e-efde-b9c4-60a6b7c9fe6e"
2024-06-29T17:27:30Z: Evaluation status changed: "pending" -> "complete"
==> 2024-06-29T17:27:30Z: Evaluation "712a7306-cbba-d799-718d-1c5e128c967a" finished with status "complete" but failed to place all allocations:
2024-06-29T17:27:30Z: Task Group "nginx-group" (failed to place 1 allocation):
* Resources exhausted on 2 nodes
* Dimension "cpu" exhausted on 2 nodes
2024-06-29T17:27:30Z: Evaluation "a0d60e5c-d964-5fb1-f4e6-eba9b8a93ccd" waiting for additional capacity to place remainder
==> 2024-06-29T17:27:30Z: Monitoring deployment "3189840b-ec6e-efde-b9c4-60a6b7c9fe6e"
⠧ Deployment "3189840b-ec6e-efde-b9c4-60a6b7c9fe6e" in progress...
2024-06-29T17:42:55Z
ID = 3189840b-ec6e-efde-b9c4-60a6b7c9fe6e
Job ID = nginx-podman-job
Job Version = 0
Status = running
Description = Deployment is running
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
nginx-group 1 0 0 0 N/A
Allocations
No allocations placed^C
I validated that CPU and memory usage are minimal:
$ nomad node status -self
ID = bb0a9563-c3c8-9557-0931-09578192389d
Name = nomad-client2
Node Pool = default
Class = <none>
DC = lab
Drain = false
Eligibility = eligible
Status = ready
CSI Controllers = <none>
CSI Drivers = <none>
Uptime = 42m21s
Host Volumes = <none>
Host Networks = <none>
CSI Volumes = <none>
Driver Status = exec,podman
Node Events
Time Subsystem Message
2024-06-29T17:05:36Z Cluster Node reregistered by heartbeat
2024-06-29T17:05:03Z Cluster Node heartbeat missed
2024-06-29T15:14:35Z Cluster Node registered
Allocated Resources
CPU Memory Disk
0/0 MHz 0 B/7.7 GiB 0 B/5.3 GiB
Allocation Resource Utilization
CPU Memory
0/0 MHz 0 B/7.7 GiB
Host Resource Utilization
CPU Memory Disk
0/0 MHz 216 MiB/7.7 GiB (/dev/mapper/ubuntu--vg-ubuntu--lv)
Allocations
No allocations placed
Here is the job I attempted to deploy:
job "nginx-podman-job" {
datacenters = ["lab"]
type = "service"
group "nginx-group" {
count = 1
task "nginx-task" {
driver = "podman"
config {
image = "docker.io/library/nginx:latest"
}
resources {
cpu = 500
memory = 256
}
}
}
}
The client VMs are configured with 4 CPU cores/8GB of memory, so I am not sure what the issue could be.
The verbose flag on the job doesn’t point me to what my issue could be.
Here is what the configuration of a client node looks like:
$ cat /etc/nomad.d/nomad.hcl
datacenter = "lab"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
# Customize other Podman driver plugin options here if needed
}
}
$ cat /etc/nomad.d/client.hcl
client {
enabled = true
servers = ["192.168.20.43", "192.168.20.59", "192.168.20.32"]
}
Can you provide any feedback on anything I may have overlooked?
Thank you