Completed batch job goes pending again after node goes down due to screen lock

madhur-df · March 27, 2024, 6:55am

Hi,

I have a single nomad client which is a macOS machine. I submitted a “batch” job that downloads and runs a binary with some arguments for me.

Once the batch job finishes, it goes into the completed state. Now, my nomad client i.e. the mac kinda goes into screen lock – making the state of the client as “disconnected” as shown below. This is because my job had the following parameter passed: max_client_disconnect = "1h"

What happens next is that the job is kinda scheduled again – which I don’t want since it was already completed before. We can see that it goes into pending as seen below.

Once I unlock my mac again i.e. the client becomes Ready, the job which was pending successfully runs again. This basically means the job is run again even though it has completed successfully before.

One can confirm this by checking the 2 allocations listed below – one is 25 mins ago while the other is just a few seconds ago – both of which are completed.

How can I stop this behavior? I am already using the following block for my group:

        max_client_disconnect = "1h"
        prevent_reschedule_on_lost = true

        reschedule {
            attempts  = 0
            unlimited = false
        }

        restart {
            attempts  = 0
            mode      = "fail"
        }

but it seems to have no effect whatsoever

TLDR: a batch job which is completed is retried again if the nomad client goes down (say screen lock) for a while and then comes back up — I don’t want this.

Topic		Replies	Views
Understanding job restart behaviour on lost jobs Nomad	2	1197	May 12, 2022
Why multiple dead and system jobs restart when restarting a Nomad client? Nomad	1	267	September 25, 2023
All nomad job switched to pending state after vault outage Nomad vault	4	386	August 27, 2021
Nomad task pending for few minutes Nomad	1	1986	August 30, 2023
Job stuck in limbo, how to prevent this from happening? Nomad	2	458	June 22, 2022

Completed batch job goes pending again after node goes down due to screen lock

Related topics