Trying mount CephFS volume into our Nomad cluster

I’m having a problem with the ceph-csi-plugin

labi@inspiron:~$ nomad status ceph-csi-plugin
ID            = ceph-csi-plugin
Name          = ceph-csi-plugin
Submit Date   = 2021-02-10T23:54:49+01:00
Type          = system
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
nodes       0       0         4        4       16        0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
7f67a7d7  7cfca224  nodes       9        run      running  2h25m ago  11m2s ago
7fcf3e00  4aad1344  nodes       9        run      running  2h25m ago  11m2s ago
8e93d98a  5d1817df  nodes       9        run      running  2h25m ago  11m2s ago
a72ce57f  3508cce0  nodes       9        run      running  2h25m ago  11m2s ago

Then I run

nomad alloc logs -stderr 7f67a7d7 ceph-node

And this is the result

I0210 21:53:56.930562       1 cephcmds.go:53] ID: 461 Req-ID: c91a5d7e-6b1f-11eb-a64f-4201c0a8010b an error (exit status 1) and stdError (modprobe: FATAL: Module ceph not found in directory /lib/modules/3.10.0-1160.15.2.el7.x86_64
) occurred while running modprobe args: [ceph]
E0210 21:53:56.930590       1 nodeserver.go:175] ID: 461 Req-ID: c91a5d7e-6b1f-11eb-a64f-4201c0a8010b failed to mount volume c91a5d7e-6b1f-11eb-a64f-4201c0a8010b: an error (exit status 1) and stdError (modprobe: FATAL: Module ceph not found in directory /lib/modules/3.10.0-1160.15.2.el7.x86_64

maybe a naive question, but are the CephFS packages installed cleanly (successfully) on the node?

Not sure what ‘cleanly’ means, but the Ceph gui tells me that the file system is healthy. What can I specifically check on my end to answer you more precisely?

[root@hashistack-client-0 ~]# yum list installed|grep ceph
ceph-common.x86_64                   2:15.2.8-0.el7                 @Ceph       
libcephfs2.x86_64                    2:15.2.8-0.el7                 @Ceph       
python3-ceph-argparse.x86_64         2:15.2.8-0.el7                 @Ceph       
python3-ceph-common.x86_64           2:15.2.8-0.el7                 @Ceph       
python3-cephfs.x86_64                2:15.2.8-0.el7                 @Ceph   

@shantanugadgil, This is what we have installed on the node. Is this what you were asking about?

I was just trying to think of what could be possibly wrong, I am no Ceph expert! :innocent:

Any clue from lsmod | grep ceph ?

From nomad client node:

[root@hashistack-client-0 wpfs]# lsmod |grep ceph
ceph                  363016  1 
libceph               306750  1 ceph
dns_resolver           13140  1 libceph
libcrc32c              12644  4 xfs,libceph,nf_nat,nf_conntrack

but from ceph-csi plugin container:

[root@a9f81dbe0549 /]# lsmod |grep ceph
ceph                  363016  1
libceph               306750  1 ceph
dns_resolver           13140  1 libceph
libcrc32c              12644  4 xfs,libceph,nf_nat,nf_conntrack
[root@a9f81dbe0549 /]# modprobe cephfs
modprobe: FATAL: Module cephfs not found in directory /lib/modules/3.10.0-1160.15.2.el7.x86_64
[root@a9f81dbe0549 /]# ls -al /lib/modules
total 0
drwxr-xr-x. 2 root root   6 May 11  2019 .
dr-xr-xr-x. 1 root root 188 Nov 19 13:06 ..
[root@a9f81dbe0549 /]# 

Does anyone have, at least, some good documentation for how to use the plugin?

another thought: is there a way to mount the directory from the OS /lib/modules into the Docker container at the location /lib/modules.

Something like: /lib/modules:/lib/modules.

I haven’t played with CSI (with Nomad) so the above is also “just a thought” :slight_smile: :slight_smile:

You are right. I got cephfs working with nomad and you need to bind /lib/modules.

        volumes = [
          "./local/config.json:/etc/ceph-csi-config/config.json",
          "/lib/modules:/lib/modules"
        ]

@kriestof could you please share the format for ceph-volume.hcl

Here is mine

id = "ceph-mysql"
name = "ceph-mysql"
type = "csi"
plugin_id = "ceph-csi"

capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

secrets {
  adminID  = "admin"
  adminKey = "AQAK3hxmTbeyAxAA+R77RggMDQ9eUko0I3xYXg=="
}

parameters {
  clusterID = "ce6c04f0-fafd-11ee-965f-0dcedbf52b34"
  fsName = "conductor"
  imageFeatures = "layering"
}

The problem is running nomad volume create ceph-volume.hcl command hangs long time and failed with context deadline exceeded

I have mounted /lib/modules and I can see lsmod |grep ceph inside ceph-csi container.

Could you please help.