Hello, I am trying to run a gpu job on my Nomad client. I have one host which has nvidia driver and nvidia containertoolkit driver installed and running as a Nomad client. This is the configuration of my Nomad job:
job “gpu-test” {
type = “batch”
datacenters = ${datacenters}
region = “${nomad_region}”
namespace = “system”
group “smi” {
task “smi” {
driver = “docker”
config {
image = "nvidia/cuda:11.0-base"
command = "nvidia-smi"
}
resources {
cpu=500
memory=256
device "nvidia/gpu" {
count = 1
constraint {
attribute = "${var.device_vendor}"
value = "nvidia"
}
constraint {
attribute = "${var.device_type}"
value = "gpu"
}
}
}
}
}
}
When deployed, the job fails with the error:
Placement Failures
smi
1 unplaced
- Constraint
missing devices
filtered 9 node
Could you please let me know what could be going wrong ?