GRPC is failing: Unavailable desc = transport is closing

lssilva · September 14, 2020, 9:11am

Hi,
When I try to start a simple task I get the following error:

   2020-09-14T11:03:08.107+0200 [ERROR] client.driver_mgr.exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=05c4dbad-2a40-861e-f118-31d080dae9cf driver=exec task_name=execTest error="rpc error: code = Unavailable desc = transport is closing"
    2020-09-14T11:03:08.107+0200 [ERROR] client.alloc_runner.task_runner.task_hook.stats_hook: failed to start stats collection for task: alloc_id=05c4dbad-2a40-861e-f118-31d080dae9cf task=execTest error="rpc error: code = Canceled desc = grpc: the client connection is closing"

and ony my system log

[2020-09-14T11:06:41.850Z] [ warning] [guestinfo] Failed to get disk info.
[2020-09-14T11:07:11.850Z] [ warning] [guestinfo] GetDiskInfo: ERROR: could not get space info for partition /tmp/NomadClient307942475/88add4d4-72ba-1877-f4fa-f507939e8adf/oldJava/alloc (deleted): Unable to statfs() the mount point

I am running Nomad v0.12.4 using the command

nomad agent -log-level=TRACE -dev

The job is

job "java" {
  datacenters = ["dc1"]

  type = "service"


  group "java" {
    count = 1

    task "java" {
      driver = "java"

      config {
        jar_path = "/path/sample.jar"
      }

    }
  }
}

My node status gives

ID              = 2a790dd4-b282-5887-ec75-07b8d17acb72
Name            = XXXXX
Class           = <none>
DC              = dc1
Drain           = false
Eligibility     = eligible
Status          = ready
CSI Controllers = <none>
CSI Drivers     = <none>
Uptime          = 540h2m18s

Drivers
Driver    Detected  Healthy  Message                             Time
docker    false     false    Failed to connect to docker daemon  2020-09-14T11:03:47+02:00
exec      true      true     Healthy                             2020-09-14T11:03:47+02:00
java      true      true     Healthy                             2020-09-14T11:03:47+02:00
qemu      false     false    <none>                              2020-09-14T11:03:47+02:00
raw_exec  true      true     Healthy                             2020-09-14T11:03:47+02:00

Node Events
Time                       Subsystem  Message          Details
2020-09-14T11:03:47+02:00  Cluster    Node registered  <none>

Allocated Resources
CPU          Memory      Disk
0/11984 MHz  0 B/23 GiB  0 B/15 GiB

Allocation Resource Utilization
CPU          Memory
0/11984 MHz  0 B/23 GiB

Host Resource Utilization
CPU            Memory         Disk
118/11984 MHz  19 GiB/23 GiB  (/dev/mapper/rootvg-root)

Allocations
No allocations placed

Attributes
cpu.arch                  = amd64
cpu.frequency             = 2996
cpu.modelname             = Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz
cpu.numcores              = 4
cpu.totalcompute          = 11984
driver.exec               = 1
driver.java               = 1
driver.java.runtime       = OpenJDK Runtime Environment 18.9 (build 11.0.8+10-LTS)
driver.java.version       = 11.0.8
driver.java.vm            = OpenJDK 64-Bit Server VM 18.9 (build 11.0.8+10-LTS, mixed mode, sharing)
driver.raw_exec           = 1
kernel.name               = linux
kernel.version            = 3.10.0-1127.18.2.el7.x86_64
memory.totalbytes         = 25092214784
nomad.advertise.address   = 127.0.0.1:4646
nomad.revision            = 8efaee4ba5e9727ab323aaba2ac91c2d7b572d84
nomad.version             = 0.12.4
os.name                   = redhat
os.signals                = SIGBUS,SIGILL,SIGPIPE,SIGTRAP,SIGFPE,SIGWINCH,SIGSTOP,SIGTERM,SIGABRT,SIGHUP,SIGIOT,SIGTSTP,SIGXFSZ,SIGQUIT,SIGSEGV,SIGSYS,SIGTTOU,SIGURG,SIGTTIN,SIGALRM,SIGCHLD,SIGINT,SIGKILL,SIGXCPU,SIGCONT,SIGIO,SIGPROF,SIGUSR1,SIGUSR2
os.version                = 7.8
unique.cgroup.mountpoint  = /sys/fs/cgroup/systemd
unique.hostname           = XXXXXXX
unique.network.ip-address = 127.0.0.1
unique.storage.bytesfree  = 15891853312
unique.storage.bytestotal = 25231179776
unique.storage.volume     = /dev/mapper/rootvg-root

Meta
connect.gateway_image = envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09
connect.log_level     = info
connect.sidecar_image = envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09

I have been debuging for a while I cannot find the issue. Which port does GRPC uses by default? Is there a simple way to test the GRPC is working?

jrasell · September 16, 2020, 9:07am

Hi @lssilva and thanks for the detail in this. I believe the gRPC message is the result of a problem, rather than the cause. The system logs you included suggest Nomad encounters an error while attempting to run the task startup, notably running statistics gathering.

To help identify the problem, are you able to try running Nomad via sudo? Would you also be able to provide any additional logs around those you have supplied as well as details about the host operating system and environment?

Thanks,
jrasell and the Nomad team.

Topic		Replies	Views
Driver=java task_name=task-name error="rpc error: code = Unavailable desc = transport is closing Nomad	1	959	October 19, 2021
I tried everything, but the exec driver just doesnt seem to work Nomad	0	630	February 10, 2020
Nomad jobs failing Nomad	5	1611	December 2, 2021
Java job error="rpc error: code = Unavailable desc = transport is closing" Nomad	2	2671	February 4, 2020
CSI controller fails with gPRC error Nomad csi	1	866	August 31, 2023

GRPC is failing: Unavailable desc = transport is closing

Related topics