Hello everyone,
I’m facing an issue with Nomad CPU detection and cpuset generation when Hyper-Threading is disabled on the host.
It seems Nomad incorrectly detects the number of available CPU cores, and as a result, all allocations under nomad.slice receive an incomplete cpuset.cpus mask.
Environment:
- Nomad version: 1.10.0
- Linux: Ubuntu 22.04
- Instance: 64 vCPUs
- Hyper-Threading: disabled (nosmt=force)
- Cgroup v2
What happens
With HT disabled, only even logical CPUs exist:
0,2,4,6,…,62 (total 32 real CPUs)
However, Nomad still detects 32 cores using SMBIOS/DMI:
dmidecode:
Core Count: 32
Thread Count: 2 (incorrect when SMT is disabled)
Nomad then creates the following cpusets:
/sys/fs/cgroup/nomad.slice/cpuset.cpus → 0,2,4,…,30
/sys/fs/cgroup/nomad.slice/share.slice/cpuset.cpus → 0,2,4,…,30
That means every allocation started through Nomad is restricted to only the first 16 physical cores (0–30 even CPUs), instead of the full available range (0–62).
Docker tasks launched by Nomad inherit this and show:
taskset -pc → 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
What I tried
I set:
client {
cpu_disable_dmidecode = true
}
But after restart, Nomad still reports:
cpu.numcores = 32
cpu.totalcompute = 52800
And cpuset masks remain truncated.
Expected behavior
When Hyper-Threading is disabled, Nomad should detect all online CPUs, not rely on SMBIOS which is incorrect in virtualized/cloud environments.
Expected cpuset.cpus:
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
Workaround
I created a systemd override to rewrite cpuset for nomad.slice, but this does not fix all sub-slices (e.g., share.slice tasks still receive truncated masks before override is applied).
The question
Is there a correct way to:
- Make Nomad detect CPU count using /sys/devices/system/cpu/online instead of DMI/SMBIOS?
- Or override the cpuset mask globally for all Nomad allocations?
- Or disable Nomad CPU pinning entirely so Docker tasks use the full CPU set unless explicitly pinned in the job spec?
Any guidance or suggestions would be greatly appreciated — this is currently preventing us from deploying workloads that rely on explicit CPU affinity.
Thanks!