Driver=java task_name=task-name error="rpc error: code = Unavailable desc = transport is closing

I’ve seen this posted as a question a few times, but never an answer as to how to fix it.

Basically I have a java jar, it works fine locally, and I can actually see it starting to log if I look at the task’s logs. But it appears there’s some infrastructure issue within Nomad that Nomad kills the job shortly after it starts with this error:

[ERROR] client.driver_mgr.java: error receiving stream from Stats executor RPC, closing stream: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f driver=java task_name=my-task error=“rpc error: code = Unavailable desc = transport is closing”

Job spec:

job "my-job" {
  datacenters = ["prod"]

  priority = 1

  type = "batch"

  group "my-group" {
    network {
      mode = "host"
      dns {
        servers = ["DNS address"]
      }
    }

    task "my-task" {
      driver = "java"

      config {
        jar_path    = "local/my.jar"
        jvm_options = ["-Xmx2048m", "-Xms256m"]

        args = [
          "some",
          "relevant",
          "arguments",
        ]
      }

      artifact {
        source = "http://host/path/to/my.jar"

        options {
          checksum = "md5:<hash>"
        }
      }
    }
  }
}

Client spec:

client {
  enabled = true
  chroot_env {
    "/bin"              = "/bin"
    "/etc"              = "/etc"
    "/lib"              = "/lib"
    "/lib64"            = "/lib64"
    "/opt"              = "/opt"
    "/run/resolveconfg" = "/run/resolvconf"
    "/sbin"             = "/sbin"
    "/usr"              = "/usr"
  }
}

nomad node status -verbose <node ID> output (from a different thread on this topic for Java):

driver.exec               = 1
driver.java               = 1
driver.java.runtime       = Java(TM) SE Runtime Environment (build 13.0.1+9)
driver.java.version       = 13.0.1
driver.java.vm            = Java HotSpot(TM) 64-Bit Server VM (build 13.0.1+9, mixed mode, sharing)

System logs don’t seem to report anything of interest, I just get this kind of error log in a loop:

2021-10-09T03:34:04.139Z [INFO]  client.alloc_runner.task_runner: restarting task: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f task=my-task reason="Restart within policy" delay=17.617384352s
2021-10-09T03:34:21.761Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f task=my-task path=/data/nomad/data/alloc/cd20ee32-8eb0-ec07-187a-e1e0d633a75f/alloc/logs/.my-task.stdout.fifo @module=logmon timestamp=2021-10-09T03:34:21.760Z
2021-10-09T03:34:21.762Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f task=my-task @module=logmon path=/data/nomad/data/alloc/cd20ee32-8eb0-ec07-187a-e1e0d633a75f/alloc/logs/.my-task.stderr.fifo timestamp=2021-10-09T03:34:21.760Z
2021-10-09T03:34:21.762Z [INFO]  client.driver_mgr.java: starting java task: driver=java driver_cfg="{Class: ClassPath: JarPath:local/my.jar JvmOpts:[-Xmx2048m -Xms256m] Args:[some, relevant, arguments] ModePID: ModeIPC: CapAdd:[] CapDrop:[]}" args=[-Xmx2048m, -Xms256m, -jar, local/my.jar, some, relevant, arguments]
2021-10-09T03:34:23.148Z [ERROR] client.driver_mgr.java: error receiving stream from Stats executor RPC, closing stream: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f driver=java task_name=my-task error="rpc error: code = Unavailable desc = transport is closing"
2021-10-09T03:34:23.151Z [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=cd20ee32-8eb0-ec07-187a-e1e0d633a75f task=my-task reason="Exceeded allowed attempts 3 in interval 24h0m0s and mode is "fail""

Is there some configuration step I’m missing to get this working?

Hi @KSRandom ,

Thanks for using Nomad.

Seems like this exact came up in a thread before. There’s some logging advice in that thread. Can you try that out and see if you get any more information? I ran into this last week, and am looking into it, but any extra info would be really helpful.

Thanks!

Derek and the Nomad Team