Trying to optimise climate impact form scheduling with Nomad

mrchrisadams · February 2, 2022, 7:51pm

Hi there.

I’m looking for pointers and ideally some folks with more experience than me to help me figure out how I can use Nomad to optimise the energy usage and carbon intensity of fleet of machines, using many of the same techniques I see in use by folks experimenting with Kubernetes to achieve the same goal.

There’s a really nice post by Bill Johnson here outlining how they’ve been experimenting with this - it’s one of the nicer ways in I think,

Some background

To make this possible you generally need two things:

an idea of the resource usage (i.e. the consumption of resources like CPU, memory, storage, etc)
the carbon intensity of the compute (i.e. how clean / dirty the energy is)

Looking at resource consumption

I’ve been doing a bit of work in this field so far, to figure out how to get per-process level energy usage figures on a host, by using tools like Scaphandre. This is a Rust binary that you can use to expose these figures in a form that can be scraped something like Prometheus and graphed in Grafana.

Scaphandre is interesting in that if you run it on physical hardware, it can expose numbers to any host virtual machines, so they can self report their share of energy usage.

I also know that Nomad itself exposes some resource usage figures, which I think are complementary to Scaphandre - if I know a machine is using say 120W of power, and I have 4 jobs running, and I know roughly what share of resources are allocated, I think I can use the two tools to start figuring out how I might attribute the energy draw in a machine to the jobs.

Looking at carbon intensity

I’ve worked with a couple of friends to make it possible to exposing some carbon intensity information about power in use too, with two go-lang packages we’ve been working on.

The first is a lowish level wrapper around querying common sources to work out how clean electricity in use is, depending on where you are in the world, and what time of day it is:

The second is a exporter for prometheus so you can combine it with usage data as mentioned above.

You can see us trying to figure out how to make these ideas work with Nomads concepts below, but none of us are that experienced with using Nomad, so I suspect we’re missing a bunch of obvious stuff.

github.com/thegreenwebfoundation/grid-intensity-go

Add a prometheus exporter

opened 10:47AM - 20 Nov 20 UTC

closed 08:22AM - 11 Nov 21 UTC

mrchrisadams

If we're targetting schedulers like Kubernetes and Nomad, as @rossf7 suggested o…ne of the nicest ways to make it possible for a scheduler to take the carbon-intensity of compute into account would be to expose the information in a form that prometheus, one of the default monitoring tools in opsland, can use. Ross's suggestion of creating an exporter makes a lot of sense, as it would largely entail use making a few time series that would expose stuff like: - carbon intensity for last 30 mins, as a high, medium, low figure (we'd likely need to figure these out on a per region basis if they're not exposed by the API) - if it's exposed, some derived figure for the carbon intensity over the next N hrs or so, just at the high/medium/low figures - 30mins - 1hr - 3hrs - 6hrs - 12hrs - 24hrs _(if we can expose the actual numbers that would be useful, but it really depends on the licensing of the data we use)_ ### Why this would help Exposing these metrics would allow us to somewhat mimic the way day-ahead figures for electricity work, and if we tracked them like this, we'd have actionable measurements for a scheduler to figure out when or where to place work based on an estimate the carbon intensity. For example, if you knew a job took 3hrs, and carbon intensity was low in one of your DC's but high in another one, you'you might shunt it to the greener one, while energy is green. Conversely, if you knew that intensity was high right now, but dropping later, you'd be able to see that the average intensity was lower over the longer period, you might schedule it to run after the peak. Similarly, if we're capturing the time series you can work out a running total of something like the _social cost of compute_, - total up the CO2 emitted, and add a multiplier to account for the cost of carbon. This cost multiplier is likely to something changes from org to org, so but it's likely something that would be useful to expose for management. ### Implementing it From what I can see, Prometheus works by collecting metrics from nodes it monitors that are running clients. These clients expose metrics over a web interface at a specific port, as a data structure I haven't quite figured out yet. If we can figure out the datastructure, the client will take care of the serving part for us, and be relatively easy to add to any k8s or nomad cluster. @rossf7 I think this is something you have some experience with I think, so I think I should defer to you here 👍 ### Useful links More on exporters - https://prometheus.io/docs/instrumenting/exporters/ Client libs we might use - https://prometheus.io/docs/instrumenting/clientlibs/ The metric types are very similar to ones used by grafana and statsd - https://prometheus.io/docs/concepts/metric_types/

Using these sources to make it possible to understand the environmental impact of jobs and allocations

I think that if you have information like this, and you know how long jobs have been running, you can start to report on factors like carbon emissions attributable to a specific job, and optimise for this kind of stuff the way you see folks doing with tools like Cloud Carbon Footprint:

Most recently this blog post from Helio Exchange outlines an idea of the direction this field could go, and I think it’s pretty nifty:

Looking for climate curious nomad nerds to work with

Most of the examples I’ve pointed to this post refer to using Kubernetes, and assume docker, but one thing I really like about Nomad is that it can work with physical servers too, and operationally I find it a bit easier to wrap my head around.

I’m aware of this link below, and it’s been pretty helpful, but I’m looking for others. If you’ve been doing any work in this area, or you’re interested, would you share a link or leave a comment?

mrchrisadams · February 11, 2022, 5:29pm

OK, I saw one response here, so it feels like it’s worth me sharing this here.

The BBC has blogged about doing the same track-and-optimise-carbon-emissions of compute trick using openstack:

I’m looking for folks who would be up for making it possible to do track-and-optimise-carbon-emissions of compute trick with Nomad, because I think similar metrics are exposed by Nomad already.

Has anyone experimented with this yet?

I’ve now got a bit of a budget to pay for freelance developer time to implement some of this - it’s part of from RIPE, and you can read about “Carbon Aware Internet” at the link below.

It’s not huge - maybe 5-10k EUR, to extend the libaries listed above.

I’d bumble through coding it all myself if I could, but I’m stretched at capacity, and my main languages are python and javascript, and making a smallish binary, or at least a consumable library in a compiled language would be beyond my skills.

If you’re a golang developer who knows their way around Nomad, and you’d be up for working with me on it, do please either leave a comment or DM me.

Cheers!

obierlaire · February 2, 2023, 6:25pm

I’m working on a project (carbonifer.io) whose goal is to provide carbon emission estimations based on Terraform files (not Nomad yet, but that’s in the plan)

Currently, in dev (only supports GCP) this tool will also analyse existing infra but also “pilot” your infra to resize, move or schedule depending on the current energy used by the grid of your cloud datacenters.

Don’t hesitate to give me feedback, and/or contribute…

Topic		Replies	Views
Is there any way to calculate the cost of Nomad resources (per namespace, per job, etc.)? Nomad	3	448	December 6, 2023
Autoscaler and bounds nop scaling Nomad	0	151	July 31, 2023
Help the Nomad team! Fill out a questionnaire on monitoring and forecasting Nomad	1	1344	February 4, 2020
Nomad potential feature feedback - Community input wanted! Nomad	1	258	January 10, 2024
Nomad for Edge Compute Nomad	0	394	July 16, 2021

Some background

Looking at resource consumption

Looking at carbon intensity

Using these sources to make it possible to understand the environmental impact of jobs and allocations

Looking for climate curious nomad nerds to work with

Related topics