Trying mount CephFS volume into our Nomad cluster

I’m having a problem with the ceph-csi-plugin

labi@inspiron:~$ nomad status ceph-csi-plugin
ID            = ceph-csi-plugin
Name          = ceph-csi-plugin
Submit Date   = 2021-02-10T23:54:49+01:00
Type          = system
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
nodes       0       0         4        4       16        0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
7f67a7d7  7cfca224  nodes       9        run      running  2h25m ago  11m2s ago
7fcf3e00  4aad1344  nodes       9        run      running  2h25m ago  11m2s ago
8e93d98a  5d1817df  nodes       9        run      running  2h25m ago  11m2s ago
a72ce57f  3508cce0  nodes       9        run      running  2h25m ago  11m2s ago

Then I run

nomad alloc logs -stderr 7f67a7d7 ceph-node

And this is the result

I0210 21:53:56.930562       1 cephcmds.go:53] ID: 461 Req-ID: c91a5d7e-6b1f-11eb-a64f-4201c0a8010b an error (exit status 1) and stdError (modprobe: FATAL: Module ceph not found in directory /lib/modules/3.10.0-1160.15.2.el7.x86_64
) occurred while running modprobe args: [ceph]
E0210 21:53:56.930590       1 nodeserver.go:175] ID: 461 Req-ID: c91a5d7e-6b1f-11eb-a64f-4201c0a8010b failed to mount volume c91a5d7e-6b1f-11eb-a64f-4201c0a8010b: an error (exit status 1) and stdError (modprobe: FATAL: Module ceph not found in directory /lib/modules/3.10.0-1160.15.2.el7.x86_64

maybe a naive question, but are the CephFS packages installed cleanly (successfully) on the node?

1 Like

Not sure what ‘cleanly’ means, but the Ceph gui tells me that the file system is healthy. What can I specifically check on my end to answer you more precisely?

[root@hashistack-client-0 ~]# yum list installed|grep ceph
ceph-common.x86_64                   2:15.2.8-0.el7                 @Ceph       
libcephfs2.x86_64                    2:15.2.8-0.el7                 @Ceph       
python3-ceph-argparse.x86_64         2:15.2.8-0.el7                 @Ceph       
python3-ceph-common.x86_64           2:15.2.8-0.el7                 @Ceph       
python3-cephfs.x86_64                2:15.2.8-0.el7                 @Ceph   

@shantanugadgil, This is what we have installed on the node. Is this what you were asking about?

I was just trying to think of what could be possibly wrong, I am no Ceph expert! :innocent:

Any clue from lsmod | grep ceph ?

From nomad client node:

[root@hashistack-client-0 wpfs]# lsmod |grep ceph
ceph                  363016  1 
libceph               306750  1 ceph
dns_resolver           13140  1 libceph
libcrc32c              12644  4 xfs,libceph,nf_nat,nf_conntrack

but from ceph-csi plugin container:

[root@a9f81dbe0549 /]# lsmod |grep ceph
ceph                  363016  1
libceph               306750  1 ceph
dns_resolver           13140  1 libceph
libcrc32c              12644  4 xfs,libceph,nf_nat,nf_conntrack
[root@a9f81dbe0549 /]# modprobe cephfs
modprobe: FATAL: Module cephfs not found in directory /lib/modules/3.10.0-1160.15.2.el7.x86_64
[root@a9f81dbe0549 /]# ls -al /lib/modules
total 0
drwxr-xr-x. 2 root root   6 May 11  2019 .
dr-xr-xr-x. 1 root root 188 Nov 19 13:06 ..
[root@a9f81dbe0549 /]# 

Does anyone have, at least, some good documentation for how to use the plugin?

another thought: is there a way to mount the directory from the OS /lib/modules into the Docker container at the location /lib/modules.

Something like: /lib/modules:/lib/modules.

I haven’t played with CSI (with Nomad) so the above is also “just a thought” :slight_smile: :slight_smile:

You are right. I got cephfs working with nomad and you need to bind /lib/modules.

        volumes = [
          "./local/config.json:/etc/ceph-csi-config/config.json",
          "/lib/modules:/lib/modules"
        ]
1 Like

@kriestof could you please share the format for ceph-volume.hcl

Here is mine

id = "ceph-mysql"
name = "ceph-mysql"
type = "csi"
plugin_id = "ceph-csi"

capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

secrets {
  adminID  = "admin"
  adminKey = "AQAK3hxmTbeyAxAA+R77RggMDQ9eUko0I3xYXg=="
}

parameters {
  clusterID = "ce6c04f0-fafd-11ee-965f-0dcedbf52b34"
  fsName = "conductor"
  imageFeatures = "layering"
}

The problem is running nomad volume create ceph-volume.hcl command hangs long time and failed with context deadline exceeded

I have mounted /lib/modules and I can see lsmod |grep ceph inside ceph-csi container.

Could you please help.