Long start times of allocations

aartur · July 18, 2022, 7:39am

Hi,
I’m trying to understand what’s the reason behind allocations staying for 1-2 minutes in the Client Status = pending state. The allocations are for dispatched jobs, they use the docker driver.

This is sample client TRACE log:

022-07-18T06:34:47.232Z [TRACE] client.cpuset.v2: add allocation: name=pm-job-1/dispatch-1658126074-a8a797ce.main[0] id=8aa0ddd3-114e-6556-4725-0c428da894ea

2022-07-18T06:36:52.203Z [DEBUG] client.alloc_runner.task_runner: lifecycle start condition has been met, proceeding: alloc_id=8aa0ddd3-114e-6556-4725-0c428da894ea task=main

So more than 2 minutes elapsed between add allocation and lifecycle start condition has been met. How can I check what Nomad was waiting for in that period?

The client is oversubscribed with CPU and runs about 250 allocs, but my understanding is that once the allocation is created, the resources are assigned and there are no further checks of real CPU usage etc.?

Disk I/O is not exhausted, the Docker containers start fast using docker command (also Nomad logs indicate that once start condition has been met, a container is created quickly), nomad process CPU usage is about 60% of a VCPU.

aartur · July 19, 2022, 6:18pm

My guess is this was caused by overloading cgroups manager on the kernel side. I captured nomad client profile and almost all time was spent on writing to cgroups files during allocs cleanups. And it looks like generally the throughput of cgroups operations the Linux kernel is able to sustain is rather low (from a few to tens of operations per second).

My fix was reducing the overallocation of CPU (to reduce the number of allocs running) and adding cgroup.memory=nokmem,nosocket to the kernel parameters to make cgroups operations faster. So far it looks to be working well after about 24h, the start times are 1-2 seconds.

aartur · July 29, 2022, 8:02am

The container creation was still getting slower after 24h+ of uptime and after about 48h it resulted in dockerd hanging and unable to launch any new container. This is GH issue: Dockerd slows down and finally hangs · Issue #43870 · moby/moby · GitHub. So what worked for me was downgrading from Ubuntu 22.04 + kernel 5.15-aws + cgroups_v2 to Ubuntu 20.04 + kernel 5.4-aws + cgroups_v1.

I also noticed that nomad process uses far less CPU on this downgraded setup, previously it hovered around 60-120% of a VCPU and now it’s 15-40%.

seth.hoenig · August 1, 2022, 3:57pm

Wow interesting, thanks for investigating and sticking with this, @aartur.

Are any of your Tasks making use of resources.cores? On the Nomad side the way the cpuset subsystem is managed did change significantly to support cgroups V2 - but I’m unsure if there would be noticeable impact without setting that option.

Topic		Replies	Views
Nomad deployment slowed down Nomad	2	342	October 7, 2022
Nomad task pending for few minutes Nomad	1	1960	August 30, 2023
Task Scheduling Latency Nomad	1	655	November 30, 2020
Nomad system jobs end up losing all allocations for no apparent reason, and not restarting them Nomad	2	549	February 21, 2024
[Nomad] How to Clean /var/lib/docker/overlay2 or limit the allocation logs stream Nomad	1	1374	January 18, 2022

Long start times of allocations

Related topics