How does CSI plugin health get determined

Hi, I’m trying out CSI with aws-ebs-csi-driver in my Nomad cluster, but my controller and nodes plugin stay in a “unhealthy” state after their corresponding tasks are launched, and I’m puzzled why.

I followed the tutorial in Stateful Workloads with Container Storage Interface | Nomad - HashiCorp Learn and submitted the following snippet to my Nomad cluster:

job "plugin-aws-ebs-controller" {
  datacenters = ["dc1"]

  group "controller" {
    task "plugin" {
      driver = "docker"

      config {
        image = "amazon/aws-ebs-csi-driver:v1.4.0"

        args = [
          "controller",
          "--endpoint=unix://csi/csi.sock",
          "--logtostderr",
          "--v=5",
        ]
      }

      csi_plugin {
        id        = "aws-ebs0"
        type      = "controller"
        mount_dir = "/csi"
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

The task was launched successfully, but the plugin doesn’t seem to register as it show up as “Unhealthy” under /ui/csi/plugins. When I check the plugin from the API at /v1/plugin/csi/aws-ebs0, I get this response:

{
  "Allocations": [],
  "ControllerRequired": false,
  "Controllers": {},
  "ControllersExpected": 1,
  "ControllersHealthy": 0,
  "CreateIndex": 944,
  "ID": "aws-ebs0",
  "ModifyIndex": 951,
  "Nodes": {},
  "NodesExpected": 0,
  "NodesHealthy": 0,
  "Provider": "",
  "Version": ""
}

Could someone point me to how I could debug this? I’ve checked the docker container running the CSI controller, it seem to be running fine and it’s listening on the socket unix://csi/csi.sock, but I don’t think it’s receiving any RPC request from the Nomad cluster.

One of the questions I have in mind is – does the socket need to be mounted from within the container back into the host, where the nomad agent is running? I had assumed the csi_plugin => mount_dir option would do this automagically but I could be wrong.

1 Like

Hi,
I had the exact same issue, and after some research and debugging, it turned out to be a permissions problem of the “csi.sock” file on the host side, see Nomad is unable to create CSI plugin due to being unable to probe the CSI driver · Issue #7931 · hashicorp/nomad · GitHub. Manually “chmod og+w path-to-csi.sock” is not a very nice solution, but worked for me (it all depends on what user is actually running the nomad daemon).