Nomad jobs failing

Most of the allocations in my cluster started to constantly restart with this error after upgrade to 0.9.3:

2019-06-28T15:16:12.913+0200 [ERROR] client.driver_mgr.java: error receiving stream from Stats executor RPC, closing stream: alloc_id=006cee61-9371-35f8-ddeb-386898e186b0 driver=java task_name=service error="rpc error: code = Unavailable desc = transport is closing"

They were running fine until 0.9.1. Any ideas?

Was there any feedback about this issue? I’m facing the same problem but for a raw_exec job.

I am also getting the same issue, when I am running spark job using nomad. Spark driver kills the executors because it does not receive RPC. If anyone comes up with any solutions please post. Thanks

For the original issue, you’ll probably want to open an issue (if it’s still a problem for you), and provide some more context around the job and logs.

Some follow-up conversation happening here https://github.com/hashicorp/nomad/issues/7061

Has this issue been resolved?
I am facing the same thing with raw_exec driver on Nomad version Nomad v0.12.4.

Hi @yash.demba

Yes. We believe this issue to have been resolved with this PR. If you upgrade to the the latest official release, or build from source, you should no longer see this issue. If you do, please let us know.

Thanks,

Derek and the Nomad Team