I’ve seen this posted as a question a few times, but never an answer as to how to fix it.
Basically I have a java jar, it works fine locally, and I can actually see it starting to log if I look at the task’s logs. But it appears there’s some infrastructure issue within Nomad that Nomad kills the job shortly after it starts with this error:
[ERROR] client.driver_mgr.java: error receiving stream from Stats executor RPC, closing stream: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f driver=java task_name=my-task error=“rpc error: code = Unavailable desc = transport is closing”
Job spec:
job "my-job" {
datacenters = ["prod"]
priority = 1
type = "batch"
group "my-group" {
network {
mode = "host"
dns {
servers = ["DNS address"]
}
}
task "my-task" {
driver = "java"
config {
jar_path = "local/my.jar"
jvm_options = ["-Xmx2048m", "-Xms256m"]
args = [
"some",
"relevant",
"arguments",
]
}
artifact {
source = "http://host/path/to/my.jar"
options {
checksum = "md5:<hash>"
}
}
}
}
}
Client spec:
client {
enabled = true
chroot_env {
"/bin" = "/bin"
"/etc" = "/etc"
"/lib" = "/lib"
"/lib64" = "/lib64"
"/opt" = "/opt"
"/run/resolveconfg" = "/run/resolvconf"
"/sbin" = "/sbin"
"/usr" = "/usr"
}
}
nomad node status -verbose <node ID>
output (from a different thread on this topic for Java):
driver.exec = 1
driver.java = 1
driver.java.runtime = Java(TM) SE Runtime Environment (build 13.0.1+9)
driver.java.version = 13.0.1
driver.java.vm = Java HotSpot(TM) 64-Bit Server VM (build 13.0.1+9, mixed mode, sharing)
System logs don’t seem to report anything of interest, I just get this kind of error log in a loop:
2021-10-09T03:34:04.139Z [INFO] client.alloc_runner.task_runner: restarting task: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f task=my-task reason="Restart within policy" delay=17.617384352s
2021-10-09T03:34:21.761Z [INFO] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f task=my-task path=/data/nomad/data/alloc/cd20ee32-8eb0-ec07-187a-e1e0d633a75f/alloc/logs/.my-task.stdout.fifo @module=logmon timestamp=2021-10-09T03:34:21.760Z
2021-10-09T03:34:21.762Z [INFO] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f task=my-task @module=logmon path=/data/nomad/data/alloc/cd20ee32-8eb0-ec07-187a-e1e0d633a75f/alloc/logs/.my-task.stderr.fifo timestamp=2021-10-09T03:34:21.760Z
2021-10-09T03:34:21.762Z [INFO] client.driver_mgr.java: starting java task: driver=java driver_cfg="{Class: ClassPath: JarPath:local/my.jar JvmOpts:[-Xmx2048m -Xms256m] Args:[some, relevant, arguments] ModePID: ModeIPC: CapAdd:[] CapDrop:[]}" args=[-Xmx2048m, -Xms256m, -jar, local/my.jar, some, relevant, arguments]
2021-10-09T03:34:23.148Z [ERROR] client.driver_mgr.java: error receiving stream from Stats executor RPC, closing stream: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f driver=java task_name=my-task error="rpc error: code = Unavailable desc = transport is closing"
2021-10-09T03:34:23.151Z [INFO] client.alloc_runner.task_runner: not restarting task: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f task=my-task reason="Exceeded allowed attempts 3 in interval 24h0m0s and mode is "fail""
Is there some configuration step I’m missing to get this working?