How to remove stale volume plugins?

A long time ago, I wanted to install CSI plugin to our Nomad cluster. This did not go well, and the solution was abandoned in favor of docker volume mounts. However, there are stale data in Nomad that have survived multiple docker version upgrades for a year now.

You can also see my error with ${..} trying to apply shell expansion to ID with bad result.

$ nomad operator api /v1/plugins?type=csi | jq
[
  {
    "ID": "${env[\"server_name\"]}",
    "Provider": "st-zfs03-h2",
    "ControllerRequired": true,
    "ControllersHealthy": 0,
    "ControllersExpected": 1,
    "NodesHealthy": 0,
    "NodesExpected": 0,
    "CreateIndex": 291850,
    "ModifyIndex": 292917
  },
  {
    "ID": "${server_name}",
    "Provider": "st-zfs03-h1-b",
    "ControllerRequired": true,
    "ControllersHealthy": 0,
    "ControllersExpected": 1,
    "NodesHealthy": 0,
    "NodesExpected": 0,
    "CreateIndex": 291829,
    "ModifyIndex": 292917
  },
  {
    "ID": "org.democratic-csi.nfs",
    "Provider": "org.democratic-csi.nfs",
    "ControllerRequired": true,
    "ControllersHealthy": 0,
    "ControllersExpected": 1,
    "NodesHealthy": 0,
    "NodesExpected": 0,
    "CreateIndex": 264484,
    "ModifyIndex": 292917
  }
]

There are no volumes in nomad volume status, becuase I have deregistered them. None of these plugins are used and democratic csi is not running for a long time.

How to remove the plugins completely?

I also see the following messages in nomad server logs, however I do not know or understand if they are related:

Mar 05 10:39:31 nomad[1966370]:     2024-03-05T10:39:31.382-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use"
 method=delete
Mar 05 10:39:31 nomad[1966370]:     2024-03-05T10:39:31.405-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use"
 method=delete
Mar 05 10:39:31 nomad[1966370]:     2024-03-05T10:39:31.433-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use"
 method=delete
Mar 05 10:39:53 nomad[1966370]:     2024-03-05T10:39:53.893-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use"
 method=delete
Mar 05 10:39:53 nomad[1966370]:     2024-03-05T10:39:53.929-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use"
 method=delete
Mar 05 10:39:53 nomad[1966370]:     2024-03-05T10:39:53.960-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use"
 method=delete
Mar 05 10:40:04 nomad[1966370]:     2024-03-05T10:40:04.309-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use"
 method=delete
Mar 05 10:40:04 nomad[1966370]:     2024-03-05T10:40:04.332-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use"
 method=delete
Mar 05 10:40:04 nomad[1966370]:     2024-03-05T10:40:04.356-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use"
 method=delete

Thanks.

By executing nomad system gc I can trigger the messages in Nomad server logs:

Mar 05 10:54:31 nomad[1966370]:     2024-03-05T10:54:31.390-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use" method=delete
Mar 05 10:54:31 nomad[1966370]:     2024-03-05T10:54:31.458-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use" method=delete
Mar 05 10:54:31 nomad[1966370]:     2024-03-05T10:54:31.512-0500 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use" method=delete

So it is related, and most probably this is the method calls to deregister the plugin. There are no volumes in nomad, i.e. nomad volume status ouptuts No CSI volumes. What is using the plugins then? How to check it?

I copied server raft state to a temporary directory for safety and executed reaft state command. This results in an error about a volume that I had and no longer have:

$ nomad operator raft state ./raft > raft_state.json
2024-03-05T11:01:53.287-0500 [ERROR] fsm: CSIVolumeDeregister failed: error="volume not found: jenkinsdist2"

I had this volume and I removed it with nomad volume deregister -namespace=services jenkinsdist2.

The raft state has csi plugins inside, they look like this:

$ jq -c '.CSIPlugins[]' raft_state.json
{"ID":"${env[\"server_name\"]}","Provider":"st-zfs03-h2","Version":"1.7.6","ControllerRequired":true,"Controllers":{},"Nodes":{},"Alloc
ations":null,"ControllerJobs":{"services":{"democratic-csi-nfs-controller":{"Namespace":"services","ID":"democratic-csi-nfs-controller"
,"Expected":1}}},"NodeJobs":{},"ControllersHealthy":0,"ControllersExpected":1,"NodesHealthy":0,"NodesExpected":0,"CreateIndex":291850,"
ModifyIndex":292917}
{"ID":"${server_name}","Provider":"st-zfs03-h1-b","Version":"1.7.6","ControllerRequired":true,"Controllers":{},"Nodes":{},"Allocations"
:null,"ControllerJobs":{"services":{"democratic-csi-nfs-controller":{"Namespace":"services","ID":"democratic-csi-nfs-controller","Expec
ted":1}}},"NodeJobs":{},"ControllersHealthy":0,"ControllersExpected":1,"NodesHealthy":0,"NodesExpected":0,"CreateIndex":291829,"ModifyI
ndex":292917}
{"ID":"org.democratic-csi.nfs","Provider":"org.democratic-csi.nfs","Version":"1.7.6","ControllerRequired":true,"Controllers":{},"Nodes"
:{},"Allocations":null,"ControllerJobs":{"services":{"democratic-csi-nfs-controller":{"Namespace":"services","ID":"democratic-csi-nfs-c
ontroller","Expected":1}}},"NodeJobs":{"services":{"democratic-csi-nfs-node":{"Namespace":"services","ID":"democratic-csi-nfs-node","Ex
pected":0}}},"ControllersHealthy":0,"ControllersExpected":1,"NodesHealthy":0,"NodesExpected":0,"CreateIndex":264484,"ModifyIndex":29291
7}

and no volumes:

$ jq '.CSIVolumes' raft_state.json
[]

Hope this gives more insight.