Infinite memory growth on Nomad Client

stswidwinski · October 26, 2022, 7:16pm

Hey!

I was quite confused to discover that it is reasonably easy to compel Nomad Agent to allocation unbound amount of memory (and consequently do the same to Nomad Server). Here is the scenario that I am running with links to code backing my claims and some measurements.

Assume a simple job with one task group and one task. Assume that this job is updated every 1 minute and does the equivalent of sleep infty otherwise. The job update takes in idempotency token to ensure that we’re not creating new instances.

As a result of such updates we start to accumulate allocs – one per update. Within a day we will accumulate some 1440 of them. This bloats memory of both the server and the client. My preliminary measurement shows around ~1GB of RSS as reported by nomad.client.nomad.runtime.alloc_bytes for around 2k allocations.

Now, one might think that these allocs are cleaned up due to the Nomad Client settings:

However, this knob only controls the garbage collection of on-disk artefacts associated with the allocations and the metadata. The GC is here:

github.com

hashicorp/nomad/blob/2b054e38e91af964d1235faa98c286ca3f527e56/client/gc.go#L233


      
          		if gcAlloc == nil {
          			return
          		}
          
          
		go a.destroyAllocRunner(gcAlloc.allocID, gcAlloc.allocRunner, "forced full node collection")
          	}
          }
          
          
// MakeRoomFor garbage collects enough number of allocations in the terminal
          // state to make room for new allocations
          func (a *AllocGarbageCollector) MakeRoomFor(allocations []*structs.Allocation) error {
          	if len(allocations) == 0 {
          		// Nothing to make room for!
          		return nil
          	}
          
          
	// GC allocs until below the max limit + the new allocations
          	max := a.config.MaxAllocs - len(allocations)
          	for a.allocCounter.NumAllocs() > max {
          		select {
          		case <-a.shutdownCh:

github.com

hashicorp/nomad/blob/v1.3.1/client/gc.go#L144


      
          
          
		liveAllocs := a.allocCounter.NumAllocs()
          
          
		switch {
          		case diskStats.UsedPercent > a.config.DiskUsageThreshold:
          			reason = fmt.Sprintf("disk usage of %.0f is over gc threshold of %.0f",
          				diskStats.UsedPercent, a.config.DiskUsageThreshold)
          		case diskStats.InodesUsedPercent > a.config.InodeUsageThreshold:
          			reason = fmt.Sprintf("inode usage of %.0f is over gc threshold of %.0f",
          				diskStats.InodesUsedPercent, a.config.InodeUsageThreshold)
          		case liveAllocs > a.config.MaxAllocs:
          			// if we're unable to gc, don't WARN until at least 2x over limit
          			if liveAllocs < (a.config.MaxAllocs * 2) {
          				logf = a.logger.Info
          			}
          			reason = fmt.Sprintf("number of allocations (%d) is over the limit (%d)", liveAllocs, a.config.MaxAllocs)
          		}
          
          
		if reason == "" {
          			// No reason to gc, exit
          			break

And the metadata cleanup is here:

github.com

hashicorp/nomad/blob/v1.3.1/client/client.go#L2311


      
          
          
	// Diff the existing and updated allocations
          	diff := diffAllocs(existing, update)
          	c.logger.Debug("allocation updates", "added", len(diff.added), "removed", len(diff.removed),
          		"updated", len(diff.updated), "ignored", len(diff.ignore))
          
          
	errs := 0
          
          
	// Remove the old allocations
          	for _, remove := range diff.removed {
          		c.removeAlloc(remove)
          	}
          
          
	// Update the existing allocations
          	for _, update := range diff.updated {
          		c.updateAlloc(update)
          	}
          
          
	// Make room for new allocations before running
          	if err := c.garbageCollector.MakeRoomFor(diff.added); err != nil {
          		c.logger.Error("error making room for new allocations", "error", err)

removeAlloc does not have other callsites and may only be triggered by one of three events:

The scheduler telling the agent that the allocs no longer exist (because the job was GCed)
Someone manually purges the allocs via nomad system gc
The Nomad Agent is lost and the new Agent does not receive the history

Ideally, I would like to be able to control the garbage collection of alloc metadata as well as data within Nomad Agent. I do not want to make it possible for someone to cause memory using correct APIs. Hence my two questions:

By what is the decision to keep infinite alloc history motivated?
Is there a setting I am missing which would constrain the absolute history per Nomad Client or per Job?

If the answer is: “avoid idempotency tokens” that would be valuable information as well

stswidwinski · October 27, 2022, 2:27pm

It seems that maybe the intention here was to clean up such allocs:

github.com

hashicorp/nomad/blob/75736b6cee588bbeb1ad504e0aa8e1e1fa287309/nomad/core_sched.go#L349


      
          			// The allocation is eligible to be GC'd
          			gcAllocIDs = append(gcAllocIDs, alloc.ID)
          		}
          	}
          
          
	return gcEval, gcAllocIDs, nil
          }
          
          
// olderVersionTerminalAllocs returns terminal allocations whose job create index
          // is older than the job's create index
          func olderVersionTerminalAllocs(allocs []*structs.Allocation, job *structs.Job) []string {
          	var ret []string
          	for _, alloc := range allocs {
          		if alloc.Job != nil && alloc.Job.CreateIndex < job.CreateIndex && alloc.TerminalStatus() {
          			ret = append(ret, alloc.ID)
          		}
          	}
          	return ret
          }
          
          
// evalReap contacts the leader and issues a reap on the passed evals and

But submitting jobs with new versions isn’t actually moving the create index at all. So the allocs are now cleaned up

Topic		Replies	Views
Nomad client allocation memory stats from telemetry seems confusing Nomad consul-nomad	3	1119	September 22, 2023
How does the garbage collector actually work? Nomad	0	186	August 8, 2023
Strange problem with Nomad allocating jobs that use up more memory than the machine actually has Nomad	0	28	March 20, 2025
[Nomad] Always stuck at +- 125K Allocation and wont go further Nomad	5	363	August 30, 2021
Stall Nomad garbage collection Nomad	3	229	November 26, 2023

Infinite memory growth on Nomad Client

Related topics