I’m looking for pointers and ideally some folks with more experience than me to help me figure out how I can use Nomad to optimise the energy usage and carbon intensity of fleet of machines, using many of the same techniques I see in use by folks experimenting with Kubernetes to achieve the same goal.
There’s a really nice post by Bill Johnson here outlining how they’ve been experimenting with this - it’s one of the nicer ways in I think,
To make this possible you generally need two things:
- an idea of the resource usage (i.e. the consumption of resources like CPU, memory, storage, etc)
- the carbon intensity of the compute (i.e. how clean / dirty the energy is)
Looking at resource consumption
I’ve been doing a bit of work in this field so far, to figure out how to get per-process level energy usage figures on a host, by using tools like Scaphandre. This is a Rust binary that you can use to expose these figures in a form that can be scraped something like Prometheus and graphed in Grafana.
Scaphandre is interesting in that if you run it on physical hardware, it can expose numbers to any host virtual machines, so they can self report their share of energy usage.
I also know that Nomad itself exposes some resource usage figures, which I think are complementary to Scaphandre - if I know a machine is using say 120W of power, and I have 4 jobs running, and I know roughly what share of resources are allocated, I think I can use the two tools to start figuring out how I might attribute the energy draw in a machine to the jobs.
Looking at carbon intensity
I’ve worked with a couple of friends to make it possible to exposing some carbon intensity information about power in use too, with two go-lang packages we’ve been working on.
The first is a lowish level wrapper around querying common sources to work out how clean electricity in use is, depending on where you are in the world, and what time of day it is:
The second is a exporter for prometheus so you can combine it with usage data as mentioned above.
You can see us trying to figure out how to make these ideas work with Nomads concepts below, but none of us are that experienced with using Nomad, so I suspect we’re missing a bunch of obvious stuff.
Using these sources to make it possible to understand the environmental impact of jobs and allocations
I think that if you have information like this, and you know how long jobs have been running, you can start to report on factors like carbon emissions attributable to a specific job, and optimise for this kind of stuff the way you see folks doing with tools like Cloud Carbon Footprint:
Most recently this blog post from Helio Exchange outlines an idea of the direction this field could go, and I think it’s pretty nifty:
Looking for climate curious nomad nerds to work with
Most of the examples I’ve pointed to this post refer to using Kubernetes, and assume docker, but one thing I really like about Nomad is that it can work with physical servers too, and operationally I find it a bit easier to wrap my head around.
I’m aware of this link below, and it’s been pretty helpful, but I’m looking for others. If you’ve been doing any work in this area, or you’re interested, would you share a link or leave a comment?