Dear Nomad team,
first of all thank you for all your hard work, nomad is an amazing product.
We are working with nomad v1.0.3, consul v1.9.3, docker-ce 20.10.1 on ubuntu 18.04.
We are using the CSI volumes (using GitHub - hetznercloud/csi-driver: Kubernetes Container Storage Interface driver for Hetzner Cloud Volumes monolith, single write, running as a system job) and it works great.
I am trying to understand how to manage the update of these jobs, but I am clearly missing something, because I cannot make it work properly.
Since these volumes have single write capabilities, I set the max_parallel value to 0 in the update (and migrate) stanzas, so I expect the job to be killed and the new one to be spawned right after.
Nomad handles the lifecycle as expected, but then I run in these two following situations:
- the first one, that solves by itself
The jobs fails one or two times (with the following error), but then it starts and works fine
failed to setup alloc: pre-run hook "csi_hook" failed: claim volumes: rpc error: controller publish: attach volume: controller attach volume: rpc error: code = Unavailable desc = failed to publish volume: server is locked
- the second one, that requires manual intervention
the job keeps on restarting, because the volume is mounted in RO mode inside the container
Terminated Exit Code: 2, Exit Message: "Docker container exited with non-zero exit code: 2" or equivament
to fix this I need to stop the job, wait a bit and then run the job again.
I cannot provide any useful logs from the plugin, since nothing significant is printed (it is running with loglevel debug).
Is this something that you can help me understand?