Hi, I feel like the doc for the
RecoverTask interface method is pretty scarce on the nomad documentation site.
For example I saw in this issue: When Nomad is restarted, the successful Job will also perform the Recover task · Issue #10449 · hashicorp/nomad · GitHub
RecoverTask is not called when a task is rescheduled.
RecoverTaskoperation is what happens when the Nomad client tries to sync its local state store with the state of the running tasks for the task drivers. So for example, with a Docker task, the client will have the Docker ID and ask
dockerdfor a handle to a container with that ID.
Normally when the Nomad client stops, tasks will keep on running. After some time, the Nomad server will declare these tasks as “lost” and reschedule them.
Can we override this behavior with a custom
reschedule stanza? What happens if the
reschedule is set to never reschedule: are the tasks still marked as lost or will the server wait indefinitely for the client to come back online to recover the task?
Do you guys have some overview diagram of the whole lifecycle of a task and which factors influence the lifecycle? I feel like I can’t really test my custom nomad driver since I don’t know how the calls to the driver are made based on the lifecycle sync between server/client.
My custom nomad driver has really sticky jobs that can’t be moved that easily to another client. However I want to provide failure recovery in case the client crashes.
Would really appreciate if someone could provide more information on that topic.