Nomad + CSI Plugin (csi-driver-iscsi) validate error

Nomad version

Nomad version: v1.9.5

Operating system and Environment details

Operating system: Ubuntu 24.04
Environment: Nomad + CSI Plugin (csi-driver-iscsi)

Issue

When setting up the csi-iscsi-plugin to manage iSCSI volumes, I encountered an issue during volume registration. The controller logs show an rpc error: code = Unimplemented desc = error when attempting to validate the volume.

Reproduction steps

  1. Configure the csi-driver-iscsi plugin following the Kubernetes CSI examples.
  2. Deploy the Nomad jobs for the controller and node plugins without errors.
  3. Attempt to register a volume using nomad volume register ./iscsi-lun4.
  4. Observe the error in the controller logs.

Expected Result

The volume is registered successfully, and the CSI plugin validates the volume properties (e.g., size, mount method) as expected.

Actual Result

The volume registration fails with the following error:

Error registering volume: Unexpected response code: 500 (controller validate volume: CSI.ControllerValidateVolume: rpc error: code = Unimplemented desc =)

Controller logs:

GRPC request: {“parameters”:{“discoveryCHAPAuth”:“false”,“iqn”:“iqn.2003-01.com.redhat.iscsi-gw:cluster”,“iscsiInterface”:“default”,“lun”:“4”,“portals”:“”,“sessionCHAPAuth”:“false”,“targetPortal”:“10.130.128.2:3260”},“volume_capabilities”:[{“AccessType”:{“Block”:{}},“access_mode”:{“mode”:1}}],“volume_id”:“iscsi-lun4”} GRPC error: rpc error: code = Unimplemented desc =

Controller job:

job "plugin-iscsi-controller" {
  datacenters = ["dc1"]
  type = "system"

  group "controller" {

    task "plugin" {
      driver = "docker"
      config {
        image = "gcr.io/k8s-staging-sig-storage/iscsiplugin:canary"
        network_mode = "host"
        args = [
          "--endpoint=unix:///csi/csi.sock",
          "--logtostderr",
          "--nodeid=${NOMAD_ALLOC_INDEX}",
          "-v=7",
        ]
        privileged = true
      }

      env {
        CSI_ENDPOINT = "unix:///csi/csi.sock"
        NODE_ID      = "${NOMAD_ALLOC_INDEX}" # Идентификатор узла
      }

      csi_plugin {
        id        = "iscsi"
        type      = "controller"
        mount_dir = "/csi"
      }

      resources {
        cpu    = 250
        memory = 300
      }
    }
  }
}

node job:

job "plugin-iscsi-node" {
  datacenters = ["dc1"]
  type = "system"

  group "node" {

  volume "lib-modules" {
    type      = "host"
    read_only = true
    source    = "lib-modules"
  }

  volume "sysfs" {
    type      = "host"
    read_only = true
    source    = "sysfs"
  }

  volume "dev" {
    type      = "host"
    read_only = false
    source    = "dev"
  }

    task "iscsi-plugin" {
      driver = "docker"

      config {
        image = "gcr.io/k8s-staging-sig-storage/iscsiplugin:canary"
        args = [
          "--nodeid=${node.unique.name}",
          "--endpoint=unix://csi/csi.sock",
          "--logtostderr",
          "--nodeid=${NOMAD_ALLOC_INDEX}",
          "-v=7",
        ]
        privileged = true
      }

      env {
        NODE_ID                   = "${node.unique.name}"
        CSI_ENDPOINT              = "unix://csi/csi.sock"
        iSCSI_INITIATOR_NAME        = "iqn.2001-07.com.ceph-uuid1:node1"
      }

      csi_plugin {
        id        = "iscsi"
        type      = "node"
        mount_dir = "/csi"
      }

      volume_mount {
        volume      = "lib-modules"
        destination = "/lib/modules"
        read_only   = true
      }

      volume_mount {
        volume      = "sysfs"
        destination = "/sys"
        read_only   = true
      }

      resources {
        cpu    = 250
        memory = 300
      }

    }
  }
}

volume file:

  type      = "csi"
  id        = "iscsi-lun4"
  name      = "iscsi-lun4"
  plugin_id = "iscsi"

    context = {
      targetPortal = "10.130.128.2:3260"
      portals = "[]"
      iqn = "iqn.2003-01.com.redhat.iscsi-gw:cluster"
      lun = "4"
      iscsiInterface = "default"
      discoveryCHAPAuth = "false"
      sessionCHAPAuth = "false"
    }

  capability {
    access_mode    = "single-node-writer"
    attachment_mode = "block-device"
  }

nomad systemd unit

[Unit]
Description=nomad agent
Requires=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Restart=on-failure
ExecStart=/usr/bin/nomad agent -dev -config=/etc/nomad.d/nomad.hcl
ExecReload=/bin/kill -HUP 
KillSignal=SIGINT

[Install]
WantedBy=multi-user.target

nomad config

# Copyright (c) HashiCorp, Inc.
# SPDX-License-Identifier: BUSL-1.1

# Full configuration options can be found at https://developer.hashicorp.com/nomad/docs/configuration

data_dir  = "/opt/nomad/data"
bind_addr = "0.0.0.0"

server {
  # license_path is required for Nomad Enterprise as of Nomad v1.1.1+
  #license_path = "/etc/nomad.d/license.hclic"
  enabled          = true
  bootstrap_expect = 1
}

client {
  enabled = true
  servers = ["127.0.0.1"]

  host_volume "plugin-dir" {
    read_only = false
    path    = "/opt/csi/bin"
  }

  host_volume "lib-modules" {
    read_only = true
    path    = "/lib/modules"
  }

  host_volume "sysfs" {
    read_only = true
    path    = "/sys"
  }

  host_volume "dev" {
    read_only = false
    path    = "/dev"
  }
}

plugin "docker" {
  config {
    allow_privileged = true
  }
}
acl {
  enabled = true
}

Additional Observations
Further investigation indicates that the issue might be related to the lack of implementation for the ControllerValidateVolume method in the CSI plugin. Here are the findings:

Nomad requests volume validation during volume registration:

The ControllerValidateVolume method in the csi-driver-iscsi plugin appears to always return an error:

In Kubernetes, the equivalent method is implemented differently and processes these requests successfully:

This suggests that the ControllerValidateVolume method is either missing or incomplete in the CSI driver used by Nomad.

The disk connection from the host using iscsiadm works correctly.

root@node-1:~# iscsiadm -m discovery -p 10.130.128.2 -t st
10.130.128.2:3260,1 iqn.2003-01.com.redhat.iscsi-gw:cluster
10.130.132.2:3260,2 iqn.2003-01.com.redhat.iscsi-gw:cluster
root@node-1:~# iscsiadm -m node --login
Logging in to [iface: default, target: iqn.2003-01.com.redhat.iscsi-gw:cluster, portal: 10.130.128.2,3260]
Logging in to [iface: default, target: iqn.2003-01.com.redhat.iscsi-gw:cluster, portal: 10.65.1.26,3260]
iscsiadm -m session -P 3
---
    Host Number: 7  State: running
    scsi7 Channel 00 Id 0 Lun: 4
      Attached scsi disk sdf    State: running
    scsi7 Channel 00 Id 0 Lun: 5
      Attached scsi disk sdd    State: running