Hello Nomad users,
I have the following job definition:
job "native_benchmarks" {
datacenters = ["dc1"]
priority = 100
type = "batch"
constraint {
attribute = "${attr.unique.hostname}"
value = "myhost.company.com"
}
group "benchmarks" {
task "multi_coremark" {
driver = "exec"
config {
command = "/opt/coremark/multi_coremark.sh"
no_cgroups = false
}
logs {
max_files = 1
max_file_size = 10
}
resources {
memory = 2000
}
}
task "npb" {
driver = "exec"
config {
command = "/opt/NPB3.0/NPB3.0-JAV/all_tests.sh"
no_cgroups = false
}
logs {
max_files = 1
max_file_size = 10
}
resources {
memory = 3000
}
}
task "ramsmp" {
driver = "exec"
config {
command = "/opt/ramspeed/ramsmp_batch.sh"
no_cgroups = false
}
logs {
max_files = 1
max_file_size = 10
}
resources {
memory = 2000
}
}
}
}
The planning and running phases work but eventually the job fails:
[user@master nomad]$ nomad status native_benchmarks
ID = native_benchmarks
Name = native_benchmarks
Submit Date = 2020-10-23T15:03:54-04:00
Type = service
Priority = 100
Datacenters = dc1
Namespace = default
Status = dead
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
benchmarks 0 0 0 3 1 0
Latest Deployment
ID = c23def65
Status = failed
Description = Failed due to progress deadline
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
benchmarks 1 4 0 4 2020-10-23T15:13:54-04:00
Allocations
ID Node ID Task Group Version Desired Status Created Modified
8f83ae7e 3ffa908d benchmarks 0 run failed 26m4s ago 23m19s ago
5e82f114 3ffa908d benchmarks 0 stop failed 29m44s ago 26m4s ago
e3cf560b 3ffa908d benchmarks 0 stop complete 30m45s ago 26m29s ago
d716cec3 3ffa908d benchmarks 0 stop failed 34m44s ago 30m45s ago
I was trying to get logs from any of the 3 tasks defined inside but I cannot get the logs (I assume is because none of the jobs managed to run).
The nodes are there and look healthy (I have 2 nodes, besides the controller):
[user@master nomad]$ nomad node status
ID DC Name Class Drain Eligibility Status
3ffa908d dc1 myhost.company.com <none> false eligible ready
98a815f7 dc1 myhost2.company.com <none> false eligible ready
I can run the commands as root on myhost (the node selection part works on nomad too) as specified on each one of the tasks.
Any help is appreciated, I’m pretty new to nomad.