How to manually stop the docker container

x602 · May 7, 2022, 2:52am

kernel: 3.10.0
nomad: 1.2.6

I’ve deleted and cleared the job, but the container is still running and has been running overnight without closing properly

$  nomad stop    -purge -namespace ic-es node-2
==> 2022-05-07T10:38:57+08:00: Monitoring evaluation "f122a359"
    2022-05-07T10:38:57+08:00: Evaluation triggered by job "node-2"
==> 2022-05-07T10:38:58+08:00: Monitoring evaluation "f122a359"
    2022-05-07T10:38:58+08:00: Evaluation status changed: "pending" -> "complete"
==> 2022-05-07T10:38:58+08:00: Evaluation "f122a359" finished with status "complete"

I tried to execute the command but it still failed

$ docker update --restart=no 8af99ea02f55
8af99ea02f55

$ docker rm -f 8af99ea02f55
Error response from daemon: Could not kill running container 8af99ea02f557baf1e0f5df3746cffea366ad4f24cf8a079fe1c40d65bb131b9, cannot remove - tried to kill container, but did not receive an exit event

I suspect there are two points to this question

Restart halfway csi-cephrbd-node job
Kernel version is too low

I hope you can give me a way to stop the container. Thank you

x602 · May 7, 2022, 2:58am

At present, I know the solution is like this, but I want to solve it without restarting docker

$ rm -rf /var/lib/docker/containers/8af99ea02f557baf1e0f5df3746cffea366ad4f24cf8a079fe1c40d65bb131b9/
$ systemctl restart docker

x602 · May 12, 2022, 3:26am

Kernel has been updated to this version but this problem persists

5.4.179-1.el7.elrepo.x86_64

DerekStrickland · May 12, 2022, 10:17am

Hi @x602. Thanks for using Nomad!

I’m sorry you are running into this issue. I’ve got some basic troubleshooting questions to ask if that’s ok.

If you run nomad status <job-name> do you still see the job as running? Probably not, but worth asking as a starting point.
It looks like you are using Ceph with this container (csi-cephrbd-node). Is that correct? Have you looked at this doc to make sure you have completed all the necessary steps? For instance, do you have your controller job deployed and is it healthy?
Can you post your job files?
Are you running this job as a system job or a service job?
Do you have any server or client logs available, and if so do you see any messages that seem relevant? You might grep for volume_manager as a start.
If you run nomad status <job-name> before purging, and make note of the alloc ID, and then purge the job, do you still still see a directory in the alloc dir with that ID as its name?

If you do still see a directory for the container in the alloc dir, it seems like there was a problem cleaning up the job. It might be a bug in Nomad, or it might be a misconfiguration. When developing, if I create a situation where cleanup didn’t happen correctly I run the following script to clean up orphaned alloc directories. I realize it’s not a solution, and only a temp fix, but it may help you avoid having to restart docker, but that’s a hope more than a guarantee .

systemctl stop nomad

grep <data_dir>/allocs/<alloc-id> /proc/mounts | cut -f2 -d" " | sort -r | sudo xargs umount -n

rm -rf <data_dir>/allocs/<alloc-id>

systemctl start nomad

Topic		Replies	Views
Can't stop system job docker containers, job is stuck Nomad nomad	1	638	January 26, 2023
Nomad Alloc not stopping forcefully Nomad	13	976	April 21, 2023
Nomad ghost job Nomad	1	703	June 10, 2020
Node will not leave cluster Nomad	14	2768	August 22, 2019
Container starts then immediately stops Nomad	0	1035	December 22, 2022

How to manually stop the docker container

Related topics