Digitalocean CSI issues

I have setup a nomad cluster in Digital Ocean using do-hashicorp-cluster which is quite lovely and very well done.

I wanted to use DO Volumes for stateful data and found This example

I’m having issue with it, however. When I bring up the example redis job I get this error in the allocation:

2021-08-23T13:00:27+01:00 Setup Failure failed to setup alloc: pre-run hook "csi_hook" failed: claim volumes: rpc error: cannot change attachment mode of claimed volume

Additionally I’m somewhat confused as to why terraform is being used to create the volume instead of letting the CSI driver do it. Is this something specific to DO?

Thanks!

Hi @bhechinger,

Thanks for using nomad! I’m sorry to hear you are having issues with the csi_plugin feature.

I’m currently working on re-creating your issue and will let you know what I find.

Cheers,

@DerekStrickland and the Nomad Team

That’s amazing, thank you so much!

Hi @bhechinger,

Thanks for reporting this issue. There was some outdated terraform configuration in that demo folder. You can either check out this pending PR and grab the updates from the branch, or wait for it to get merged into main.

Thanks again for using Nomad!

@DerekStrickland and the Nomad team

I probably won’t get to messing with this again until Monday. I’ll report back here though once I’ve had a chance to try it again.

Thanks again for all your help!

Sounds great. Have a wonderful weekend!

Failed as follows:

Recent Events:
Time                       Type           Description
2021-09-01T12:24:19+01:00  Setup Failure  failed to setup alloc: pre-run hook "csi_hook" failed: claim volumes: rpc error: controller publish: attach volume: controller attach volume: CSI.ControllerAttachVolume: rpc error: code = Unknown desc = POST https://api.digitalocean.com/v2/volumes/1ed876e7-0b17-11ec-992a-0a58ac144393/actions: 404 (request "5fcadc85-3cf8-4f05-86dd-d534036040aa") invalid volume id: allocation not found
2021-09-01T12:24:17+01:00  Received       Task received by client

Are you running the demo inside the do-hashicorp-cluster container? That tripped me up. I had to clone nomad there, set the do_token variable, run the ssh tunnels script that do-hashicorp-cluster provided, and run terraform apply from inside the container.

I’m not, but if it’s only relying on the tunnels I copied that script out of the repo and have one that I run so that I can get the tunnels without the container running.

Just tried this from inside the container just in case there was something else going on I was not aware of. Made zero difference. Get the same exact error.

Just curious if you made any progress? I haven’t forgotten about you. I just can’t seem to reproduce the error.

None whatsoever. :frowning:

Let me tear it all down and stand it back up documenting exactly what steps I’ve taken so we can see if maybe I’m missing something?

Just sat down and tried this again and this time it worked. :man_shrugging:

So it… worked? Maybe?

In the job spec it has this:

      volume_mount {
        volume      = "test"
        destination = "/test"
      }

Should it have mounted that into the container? I don’t see it there, however:

root@79416ca8426d:/data# mount | grep temp
root@79416ca8426d:/data# df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          25G  3.3G   21G  14% /
tmpfs            64M     0   64M   0% /dev
tmpfs           489M     0  489M   0% /sys/fs/cgroup
shm              64M     0   64M   0% /dev/shm
/dev/vda1        25G  3.3G   21G  14% /data
tmpfs           1.0M     0  1.0M   0% /secrets
udev            472M     0  472M   0% /test
tmpfs           489M     0  489M   0% /proc/acpi
tmpfs           489M     0  489M   0% /proc/scsi
tmpfs           489M     0  489M   0% /sys/firmware
root@79416ca8426d:/data# ls /test
/test
root@79416ca8426d:/data# ls -l /test
brw-rw---- 1 root disk 8, 0 Mar  1 11:57 /test
root@79416ca8426d:/data#

That doesn’t seem right?