Can host volumes be used as destination for an artifact such that artifact is not downloaded if it already exists

madhur-df · March 13, 2024, 1:53pm

let’s say I have a binary to add two numbers…which are given as input via command line arguments.

The binary is hosted on a public URL.

While we can submit a job that has the URL in the artifact section such as the following:

driver = "raw_exec"

      artifact {
        # should be a permanent URL, from our artifact store
        source = "https://fleet-binaries.s3.amazonaws.com/out_10837808-a7dc-4567-ae7b-1d45ecf87a2d"
      }
      config {
        # should be a permanent Name
        command = "local/out_10837808-a7dc-4567-ae7b-1d45ecf87a2d"
      
        # this will be tailormade for different kinds of binaries that we define
	 args = ["3", "4"]
      }

If I invoke the same job with different args, the binary will be downloaded again each time…but inside separate new task allocation folders etc.

Is there a way to avoid downloading the binary multiple times for the same job but different arguments if it already exists… Can it be done via host volumes?

If yes, can someone help me with how the config could look like?
If not, how can we clean-up previously downloaded binaries…

All jobs are of batch type with the raw_exec driver…

benvanstaveren · March 14, 2024, 3:10am

Consider the notion that you are basically trying to wedge a parameterized job into a batch job. Create a parameterized job (and maybe adjust your binaries) that can take the numbers to add as a payload. Anyway, not related to what you asked.

So, to answer your question: yes, it will download the artifact for each allocation because as of yet there is no such thing as a cache for artifacts. It doesn’t happen with docker images because the node keeps images around for a while so you only get hit with the 1 download.

Can it be done with host volumes: sort of. The changes alloc A makes are visible to alloc B - but you’d need to spin up a separate job to download the binaries to the host volume, and then you create your batch jobs. There is no caching mechanism for this.

Alternatively you can pull a stunt with a pre-start task that runs a shell script that checks if the binary exists, and downloads it if it doesn’t, which will block the main task from starting until it’s done doing whatever it needs to do. This will (as far as I can tell) theoretically work, but can (and will) probably break in some very interesting ways.

madhur-df · March 14, 2024, 5:55am

even with creating a parametrized job, each invocation of the parametrised job still creates new allocation folders i.e. the binary is still downloaded again and again…

madhur-df · March 14, 2024, 5:57am

@benvanstaveren would it be possible for you to guide me on how the pre-start solution could look like? Some sort of a blueprint that is…

benvanstaveren · March 14, 2024, 10:12am

I’ll give you the big picture, how to implement I will leave as an exercise to the reader (I know, terrible, but…) first, check the docs at lifecycle Block - Job Specification | Nomad | HashiCorp Developer - that’ll give you the options for the lifecycle stanza. Important to note that in your case, you are not running a sidecar, it’s a true “pre-start” task.

Then in your job, inside the task group, you will declare 2 tasks, first task (the main task, without lifecycle stanza) is what you currently have. The second task will be the pre-start and will have something like:

lifecycle {
    hook = "prestart"
}

You will have to mount the host volume you’re intending to use for binary storage in both tasks, at some accessible point (i.e. local/bin). The prestart task should run whatever script or executable you need to check the existence of the binary needed for the main task, and should either download and save it, or just exit if the binary already exists.

Once the prestart task has finished, the main task will be started. If I recall correctly the exec driver assumes that the binary you are starting exists on the server already, so this should work.

Hope this helps

madhur-df · March 15, 2024, 3:23am

@benvanstaveren I kinda followed your steps but it seems like mounting on host volumes is not supported on the raw_exec task driver:

I also get the following error while trying to use it:

volumes: task driver "raw_exec" for "download_binary" does not support host volumes

madhur-df · March 15, 2024, 4:36am

But, since the filesystem isolation level is None with the raw_exec task driver.

I was able to use a path of my choice along with the prestart lifecycle hook to use the 2 tasks approach as @benvanstaveren mentioned.

The reason I am forced to use raw_exec is because my nomad client is macOS where exec is not supported.

TLDR: I was able to make it work without using host volumes and instead using a path of my choice.

So, thank you

benvanstaveren · March 15, 2024, 10:29am

Forgot about the host volume thing on the raw_exec driver, my bad! I’m too used to docker

Glad to hear it worked for you though

Topic		Replies	Views
Download artifact once for parameterized batch job Nomad	0	602	November 18, 2021
Nomad artifact stanza Nomad	6	1802	October 18, 2021
How to share qcow artifact between allocs? Nomad	5	1761	December 8, 2021
Running a command from a mounted hosted volume Nomad	1	844	January 6, 2022
Artifact stanza, how does it work? Nomad	11	4178	April 5, 2020

Can host volumes be used as destination for an artifact such that artifact is not downloaded if it already exists

Related topics