Is it expected to clean up the alloc directory immediately after the workload is completed/failed?
When testing with new job configurations, some of the jobs fail, and we cannot triage them due to a lack of alloc logs. The click of alloc files on UI takes me to a 404 page. Losing access to all log files makes it difficult to triage the issue.
This issue makes things even worst if the restart & reschedule are set to 0.
So, Is there a way to preserve the alloc dir after the alloc ends? Is it something misconfiguration with the client?
Hi @krundru the cleanup of allocations sandboxes is tunable via client configuration, look for the options starting with
However considering your allocations are so recent, I’m wondering if it’s actually a problem with the setting up the sandbox in the first place - e.g. failing to download an image or something like that. You can look at the
Recent Events of the allocation to see what’s going on there, e.g.
➜ nomad alloc status 26 | grep -A5 "Recent Events"
Time Type Description
2022-11-07T09:46:37-06:00 Alloc Unhealthy Unhealthy because of failed task
2022-11-07T09:46:33-06:00 Not Restarting Exceeded allowed attempts 2 in interval 30m0s and mode is "fail"
2022-11-07T09:46:33-06:00 Driver Failure Failed to pull `shoenig/simple-http:does-not-exist`: API error (404): manifest for shoenig/simple-http:does-not-exist not found: manifest unknown: manifest unknown
2022-11-07T09:46:32-06:00 Driver Downloading image
 client Stanza - Agent Configuration | Nomad | HashiCorp Developer
thanks @seth.hoenig for responding on this issue.
Couple of things are,
This is a real nomad cluster with client running on AWS EC2 instance and we didn’t specific any gc configurations for client.
When looked at the Disk usage, I found 92% used and left with 650MB free space. Do you think this is the reason?
That is exactly the issue, Nomad will by default start garbage collecting allocations immediately if the clients disk is above 80%.
You’ll see a similar log entry to this when it’s happening:
client.gc: garbage collecting allocation: alloc_id=118a22a8-c186-546e-0f84-a1eb46a5d9d4 reason="disk usage of 83 is over gc threshold of 80"
The threshold for this can be managed using client Stanza - Agent Configuration | Nomad | HashiCorp Developer
Hope that helps.