API: alloc restart calls timing out

c.k · February 2, 2022, 2:32pm

I am developing an internal mechanism to remotely restart allocs via the Nomad API. In my testing, I have noticed that using a POST on /v1/client/allocation/:alloc_id/restart eventually returns a “Timed out reading data from server” response. Via the GUI, I get this error when I try to restart an alloc:

Could Not Restart Allocation
rpc error: 2 errors occurred: * Task not running * Task not running

That alloc will eventually fail and will then start on another client. Are these 2 errors related? Is there a way to increase the timeout threshold of the restart API call? Thank you as always!

lgfa29 · February 2, 2022, 7:25pm

Hi @c.k

I don’t think there’s any timeout in the Nomad side

Would you be able to provide a sample of the job you have running? More specifically, do you have any kill_timeout set in your tasks?

Maybe check if your HTTP client has some timeout setting, or, if you’re accessing the Nomad API via a proxy or load balancer, if that has any timeout set as well.

c.k · February 2, 2022, 7:46pm

Hey! I don’t know if I can see you a sample, I will have to check. As far as a killtimeout, I can see a few "KillTimeout": 5000000000 and higher. The API timeout/failures is occurring about 10 -15 seconds after the command is sent. The same timeout behavior can be seen in the GUI as well.

Topic		Replies	Views
API: alloc stop doesn't stop per documentation Nomad	2	649	February 2, 2022
Nomad Job Restarts via REST API Nomad	3	1659	June 12, 2019
0.9.x Zombie Allocations Nomad	1	998	July 23, 2019
Stop allocation restarts it automatically Nomad	4	2142	April 21, 2023
Nomad Alloc not stopping forcefully Nomad	13	1343	April 21, 2023

API: alloc restart calls timing out

Related topics