API: alloc restart calls timing out

I am developing an internal mechanism to remotely restart allocs via the Nomad API. In my testing, I have noticed that using a POST on /v1/client/allocation/:alloc_id/restart eventually returns a “Timed out reading data from server” response. Via the GUI, I get this error when I try to restart an alloc:

Could Not Restart Allocation
rpc error: 2 errors occurred: * Task not running * Task not running

That alloc will eventually fail and will then start on another client. Are these 2 errors related? Is there a way to increase the timeout threshold of the restart API call? Thank you as always!

Hi @c.k :wave:

I don’t think there’s any timeout in the Nomad side :thinking:

Would you be able to provide a sample of the job you have running? More specifically, do you have any kill_timeout set in your tasks?

Maybe check if your HTTP client has some timeout setting, or, if you’re accessing the Nomad API via a proxy or load balancer, if that has any timeout set as well.

Hey! I don’t know if I can see you a sample, I will have to check. As far as a killtimeout, I can see a few "KillTimeout": 5000000000 and higher. The API timeout/failures is occurring about 10 -15 seconds after the command is sent. The same timeout behavior can be seen in the GUI as well.