Prestart tasks (resource allocation)

idrennanvmware · December 30, 2020, 2:48pm

I’ve been thinking a lot about this one as it’s something we run in to quite a bit. Today, prestart (and post stops) will require dedicated resources as one might expect. However these resources stay allocated even after the action is run. For side cars this makes sense but for one time actions it now creates a scenario where node resources are allocated but won’t ever be used unless the group restarts

When there are a lot of groups/jobs each with their own prestart this can add up substantially. I was thinking, unless it’s a sidecar task the scheduler should release these resources (for pre start at least) back to the pool for allocation. If the container needs a restart and there aren’t enough resources then it’s fine that it moves nodes (imo)

I’d be curious what the general opinion is around this

Note we use HARD cpu limits to prevent containers spiking up and limiting host level processes so setting a soft option isn’t really an option for us. We need to protect the host cpu

Thanks!
Ian

idrennanvmware · December 31, 2020, 12:55am

expanding on the use case a little.

Lets say I have a container that, on start up, may be configured to do some temporary but intensive tasks. Typically the container is allocated 128 CPU but in this case we need a temp burst of 512 CPU. We also have to use hard CPU allocations to prevent host vm abuse.

So now we have a few choices (yes, DAS would help with some of these but lets say that’s not an option for now)

Over provision the container to always be 512 CPU. Not terrible, until we consider that we may have 50 containers all with the same over provisioning. Starts to get expensive quickly.
OR
Use a prestart task with the 512 allocation, and then run the regular container at 128 allocation

Hope this sheds some light/context for discussion
Ian

idrennanvmware · January 4, 2021, 9:49pm

@tgross - QQ - is this the right forum for these kind of questions, or is it better to make a github issue with some sort of “design/discussion” flag?

Wolfsrudel · January 4, 2021, 10:06pm

It’s the discuss forum, so it might be the right place.

idrennanvmware · January 4, 2021, 10:07pm

True - there’s also a LOT of unanswered items here

Wolfsrudel · January 4, 2021, 10:09pm

I think the team was on a well-deserved Christmas vacation. Your questions will certainly be picked up and answered. Some patience.

idrennanvmware · January 4, 2021, 10:14pm

Um - I think you misunderstand my post but thanks for your feedback.

Nothing negative intended to anyone just a discussion on the right place for design topics.

tgross · January 5, 2021, 2:05pm

Hi @idrennanvmware!

is this the right forum for these kind of questions, or is it better to make a github issue with some sort of “design/discussion” flag?

This is a great place for them. And if a discussion generates a bug report or feature request we can always open a GitHub issue for it. I think you’ll find the Nomad engineering team isn’t quite as aggressive at answering questions here as we are on GitHub issues, just because we want to leave room for folks from the community to participate here. We also have a question label in GitHub, so whichever works best for you.

On to the issue at hand…

The scheduler should be “doing the right thing” inasmuch as it should be allocating the minimum amount of resources required for the entire allocation, taking into account what tasks are running concurrently due to lifecycle. So for an example using RAM resources:

Prestart Task	Main Task	Allocated
100MB (sidecar)	200MB	300MB
100MB (no sidecar)	200MB	200MB
200MB (sidecar)	100MB	300MB
200MB (no sidecar)	100MB	200MB

It looks like the last line in that table is the unfortunate case you’re running into?

I suspect that when we were designing that there was an assumption that in the common case the main task would require more resources. And it seems that we’re accounting for the entire allocation restarting, even though the only way that typically happens is if a user does a nomad alloc restart – the restart block of the jobspec controls the restart of tasks, not the whole alloc.

I pulled up the docs for lifecycle, the Learn Guide for Task Dependencies, and also the docs for resources and I see we’re definitely missing a description of the intention around prestart resources or the behavior of prestart tasks when the main task restarts. So I’ll open an issue for that documentation item for sure.

tgross · January 5, 2021, 2:11pm

Opened document resource scheduling for prestart tasks · Issue #9725 · hashicorp/nomad · GitHub

tgross · January 5, 2021, 2:13pm

Also, docs: more documentation for lifecycle stanza by cgbaker · Pull Request #9693 · hashicorp/nomad · GitHub has more documentation on the lifecycle behaviors around restarts, but that hasn’t been pushed to the website yet.

idrennanvmware · January 5, 2021, 4:03pm

Thanks @tgross! So here’s a real example of what we see (using the UI)

We have a Zookeeper Deployment that consists of the following (in the same group)

1x Prestart Task NO sidecar (64 CPU)
1x Task (64 CPU)
1x Task (100 CPU)
1x Task (512 CPU)

Nomad UI reports reserved CPU: 740

In this case, it’s minor that 64 is allocated and not freed up, but the scenario we are exploring is more extensive prestart tasks than main tasks so then the resource being allocated does matter more

EDIT: Posted more details on the github issue (with screenshots) so I’ll move over to that forum for now instead of double spamming you

vadim.cesonis · February 1, 2022, 1:46pm

Hi!
You should post here a link to that issue on a GitHub. To make it easier to find.

Topic		Replies	Views
What are recommended CPU settings for task resources? Nomad	1	468	March 29, 2022
Question about group `count` and client allocation Nomad	3	1069	May 2, 2022
Long start times of allocations Nomad	3	880	August 1, 2022
Nomad system jobs end up losing all allocations for no apparent reason, and not restarting them Nomad	2	549	February 21, 2024
Will the scheduler over-allocate a node? Nomad	2	946	February 3, 2020

Prestart tasks (resource allocation)

Related topics