I’m trying to figure out the correct way of delaying a task startup so services that it depends on have enough time to come up.
For example there is a task app that depends on a task db. db takes quite some while to startup. Meanwhile app starts to cycle unsuccessfully trying to connect to db.
So I need something to configure “don’t try to start app before time n is elapsed”.
Basically I would already be happy with a not so perfect solution that includes a health check. I just wanted to prevent that n-nodes immediately starting to go wild trying to connect to a workload that takes a while.
E.g in my case i’m running a Java application compiled to a native binary. It just takes 0.5 seconds to start while the database takes at least 5-10 seconds
In that case I think service discovery is your friend. You can combine the method described in the lifecycle page to create a pre-start script that runs in loop and polls service discovery for status of the DB. If db is healthy it can finish and unblock main task (the app).
It might be even possible to do with the template{} config, since template generation blocks if key/service isn’t found. However I’m not sure if the nomadService (Nomad SD) or service (Consul SD) takes service health into account when generating the output. So I would start with checking that first.
Thanks for those hints that I will check. However regarding the template approach I think it does not block.
The app workload has a template that does service discovery. That template content is is written to file as env and when the db is not ready, the IP address of the db renders as blank causing the app to fail and then retry a while later until the db service resolves properly.
if the name of the db service is predictable in Consul DNS format, you could add a dig or nslookup on the db service name before starting the ‘app’ service.
Also to prevent all instances of ‘app’ service connecting all-at-once, you could also add a small random delay after the dig (or nslookup) was successful.