CSI not mounting DigitalOcean volumes

At DigitalOcean I have created multiple volumes (via Terraform) and they show up in DigitalOcean as unattached. In my Nomad cluster, I have the digitalocean/do-csi-plugin running. When I deploy a job that requests a CSI volume the deployment responds with this message:

Constraint missing CSI Volume sample-volume filtered 4 nodes

The job specification has:

    ...
    volume "cloud-volume" {
      type            = "csi"
      source          = "sample-volume"
      access_mode     = "single-node-writer"
      attachment_mode = "file-system"
    }

    task "app" {
      driver = "exec"

      volume_mount {
        volume      = "cloud-volume"
        destination = "${NOMAD_TASK_DIR}/state"
      }
    ...

The DigitalOcean CSI Plugin is running as two jobs. One for the controller and one for the nodes. The output from those two jobs looks like this:

Controller
-----------
time="2023-04-06T16:38:36Z" level=info msg="removing socket" host_id=349266694 region=nyc3 socket=/csi/csi.sock version=v4.5.1
time="2023-04-06T16:38:36Z" level=info msg="starting server" grpc_addr=/csi/csi.sock host_id=349266694 http_addr= region=nyc3 version=v4.5.1
time="2023-04-06T16:38:36Z" level=info msg="probe called" host_id=349266694 method=probe region=nyc3 version=v4.5.1
time="2023-04-06T16:38:36Z" level=info msg="get plugin info called" host_id=349266694 method=get_plugin_info region=nyc3 
  response="name:\"dobs.csi.digitalocean.com\" vendor_version:\"v4.5.1\" " version=v4.5.1
time="2023-04-06T16:38:36Z" level=info msg="probe called" host_id=349266694 method=probe region=nyc3 version=v4.5.1
time="2023-04-06T16:38:36Z" level=info msg="get plugin capabitilies called" host_id=349266694 method=get_plugin_capabilities region=nyc3 
  response="capabilities:<service:<type:CONTROLLER_SERVICE > > 
            capabilities:<service:<type:VOLUME_ACCESSIBILITY_CONSTRAINTS > > 
            capabilities:<volume_expansion:<type:ONLINE > > " 
  version=v4.5.1
time="2023-04-06T16:38:36Z" level=info msg="probe called" host_id=349266694 method=probe region=nyc3 version=v4.5.1
time="2023-04-06T16:38:36Z" level=info msg="controller get capabilities called" host_id=349266694 method=controller_get_capabilities region=nyc3 
  response="capabilities:<rpc:<type:CREATE_DELETE_VOLUME > > 
            capabilities:<rpc:<type:PUBLISH_UNPUBLISH_VOLUME > > 
            capabilities:<rpc:<type:LIST_VOLUMES > > 
            capabilities:<rpc:<type:CREATE_DELETE_SNAPSHOT > > 
            capabilities:<rpc:<type:LIST_SNAPSHOTS > > 
            capabilities:<rpc:<type:EXPAND_VOLUME > > 
            capabilities:<rpc:<type:LIST_VOLUMES_PUBLISHED_NODES > > " 
  version=v4.5.1
  ...

Node
-----
time="2023-04-06T16:40:33Z" level=info msg="removing socket" host_id=349266693 region=nyc3 socket=/csi/csi.sock version=v4.5.1
time="2023-04-06T16:40:33Z" level=info msg="starting server" grpc_addr=/csi/csi.sock host_id=349266693 http_addr= region=nyc3 version=v4.5.1
time="2023-04-06T16:40:33Z" level=info msg="probe called" host_id=349266693 method=probe region=nyc3 version=v4.5.1
time="2023-04-06T16:40:33Z" level=info msg="get plugin info called" host_id=349266693 method=get_plugin_info region=nyc3 
  response="name:\"dobs.csi.digitalocean.com\" vendor_version:\"v4.5.1\" " version=v4.5.1
time="2023-04-06T16:40:33Z" level=info msg="probe called" host_id=349266693 method=probe region=nyc3 version=v4.5.1
time="2023-04-06T16:40:33Z" level=info msg="get plugin capabitilies called" host_id=349266693 method=get_plugin_capabilities region=nyc3 
  response="capabilities:<service:<type:CONTROLLER_SERVICE > > 
            capabilities:<service:<type:VOLUME_ACCESSIBILITY_CONSTRAINTS > > 
            capabilities:<volume_expansion:<type:ONLINE > > " 
  version=v4.5.1
time="2023-04-06T16:40:33Z" level=info msg="node get info called" host_id=349266693 method=node_get_info region=nyc3 version=v4.5.1
time="2023-04-06T16:40:33Z" level=info msg="probe called" host_id=349266693 method=probe region=nyc3 version=v4.5.1
time="2023-04-06T16:40:33Z" level=info msg="node get capabilities called" host_id=349266693 method=node_get_capabilities 
  node_capabilities="[rpc:<type:STAGE_UNSTAGE_VOLUME >  
                      rpc:<type:EXPAND_VOLUME >  
                      rpc:<type:GET_VOLUME_STATS > ]" 
  region=nyc3 
  version=v4.5.1
  ...

I see no errors or warning or any other information in the logs about the plugin trying to find or attach volumes for the job. If I at least had an error, I could have something to work on.

From the command line I check the status:

$ nomad plugin status cloud-provider
ID                   = cloud-provider
Provider             = dobs.csi.digitalocean.com
Version              = v4.5.1
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 4
Nodes Expected       = 4

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created   Modified
0eddf2bd  8d9d9f32  primary     0        run      running  2m2s ago  2m1s ago
f1173ddb  8c5733e8  primary     0        run      running  2m2s ago  2m2s ago
5a7d760d  665a1555  primary     0        run      running  2m2s ago  2m2s ago
a24c0c4f  59c3dcb3  primary     0        run      running  2m2s ago  2m2s ago
ae4bc347  59c3dcb3  primary     0        run      running  4m ago    3m49s ago

---

$ nomad volume status
Container Storage Interface
No CSI volumes

What should be my next step if discovering why the volumes are not being found and used?

Nomad 1.5.0 and 1.5.2 (across multiple nodes)

The plugin, controllers, nodes all seem to be working fine. My confusion has led me to examine the constraints that are defined in this job.

If I run the job with the following constraints, and without a volume block, the job deploys successfully.

    constraint { distinct_hosts = true }
    constraint {
      attribute = meta.tags
      set_contains = "cloud-volume"
    }
    constraint {
      attribute = "${node.class}"
      value     = "general"
    }

As soon as I add the volume block:

    volume "cloud-volume" {
      type            = "csi"
      source          = "volume-test"
      access_mode     = "single-node-writer"
      attachment_mode = "file-system"
    }

The deployment fails to place any allocation:

Task Group "default" (failed to place 1 allocation):
  * Class "reserved": 2 nodes excluded by filter
  * Class "general": 6 nodes excluded by filter
  * Constraint "${meta.tags} set_contains cloud-volume": 4 nodes excluded by filter
  * Constraint "missing CSI Volume volume-test": 4 nodes excluded by filter
Evaluation "b5a342a7" waiting for additional capacity to place remainder

Things I have tried thus far:

  • upgraded the digitalocean/do-csi-plugin to the latest (4.5.1)
  • upgraded the Nomad nodes to the latest (1.5.3)
  • tried the do-csi-plugin in monolith mode as well as node/controller mode
  • used different names/ids for the volumes that are created at DigitalOcean
  • used different values for the volume.source to make sure I am matching the volume names/ids
  • watched log files on Nomad controllers and workers for any indications of failure
  • watched log files on do-csi-plugin jobs for any indications of failure
  • watched the Nomad “Monitor” logs at the “TRACE” level
  • played with constraint/affinity blocks
  • compared Nomad clusters on DigitalOcean and other providers where the CSI plugins are working
  • read through issues at DigitalOcean
  • read through issues at Nomad
  • read through the driver code
  • read and re-read through all the files in the Nomad CSI DigitalOcean demo

I have been unable to determine where the problem is. I do not even know where to look next to try and debug this problem. I cannot tell if the problem is at DigitalOcean, Nomad, or with myself (am I holding it wrong). I do not get any errors, other than the failure to allocate.

What can I try next? :smiley:

Not a nomad user but it seems there is no overlap between nodes that meet the job constraints and nodes that have the CSI driver installed.

Are you able to create a job with an attached volume and no constraints? If not, it may be a CSI configuration issue.

It would also be helpful if you shared more details about your nodes attributes, tags, constraints (and why you need them).

I am also interested on this, did you make any improvement? If not, I will try to help this weekend.

Ok. My boss found the solution. Apparently I had done everything correctly, but there was one missing link. The “nomad volume register” command. Creating the volumes at DigitalOcean is not sufficient. Once they are created, Nomad needs to be told about them specifically. I made the incorrect assumption that Nomad was watching the DigitalOcean account and would just pull in volumes automagically. Once the volumes were registered, everything else fell into place.