Hi all! I’m currently experimenting with moving an internal shared platform that hosts multiple PHP services isolated inside containers. This currently runs using a home-grown orchestration system that we’re looking at replacing. It looks somewhat like this:
The containers just run the service in question and boot up when a requests come in.
For my new setup I plan on using Consul, Nomad and Traefik for proxying/load-balancing, with it looking roughly like this:
More or less the same layout, just switching our homebrew orchestrator for Nomad. I haven’t included Consul in the above diagram but we plan on registering services via Nomad so Traefik picks up routing through there. We’ll run Consul clients on the Nomad Clients and run a 3 server “servers/leaders” setup for both Consul and Nomad (basically the suggested architecture for both).
One thing I’m struggling to wrap my head around with Nomad is how to shut down idle services.
Our current workflow looks something like this:
- Request comes in for
- If a container is running for it already, route request to it
- If a container isn’t running, start a container
- If we’re tight on resources, shutdown the least used contianer
I can mostly wrap my head around getting this into Nomad land and I believe it will look something like this:
- Request comes in for
- If a Job/Service is running for it already, route request to it
- If a Job/Service isn’t running, register new Job with Nomad and wait for it to start
What I’m unsure how to do is if the Nomad cluster is low on resources how do I shutdown the least used Job to free resources on the cluster? We can horizontally scale to an extent but since we have heavy caching in front of these web apps there will likely be a large number of Jobs registered and running that aren’t actively being used that can be freed to start other services as they come in. Shutting down idle services would be our preferred course of action and what we currently do now.
Is this something thats commonly done with Nomad or is continually horizontally scaling more the done thing? One idea I had was using metrics from Traefik to somehow identify the least used service and stop it but I’m unsure whether there is a better/more common way!
I admit I’m very early on in my journey so feel free to tear this apart or point out if I’m doing something completely wrong. Hopefully this all makes sense and thanks in advance!