hi there.
Based on a few earlier conversations I’ve had about Nomad, Google Borg, and now Kubernetes I’ve been prompted to read this paper here from the ACM - Designing Cluster Schedulers for Internet-Scale Services. I think one of the authors originally worked on Nomad too, and I had a couple of questions as a result of reading it:
https://queue.acm.org/detail.cfm?id=3199609
One key point detailed in the paper is how Nomad and Borg seem to use an event-based model for reconciliation, rather than constantly polling to check the state of the cluster:
Schedulers usually track the cluster state and maintain an internal finite-state machine for all the cluster objects they manage, such as clusters, nodes, jobs, and tasks. The two main ways of cluster state reconciliation are level- and edge-triggered mechanisms. The former is employed by schedulers such as Kubernetes, which periodically looks for unplaced work and tries to schedule that work. These kinds of schedulers often suffer from having a fixed baseline latency for reconciliation.
Edge-triggered scheduling is more common. Most schedulers, such as Mesos and Nomad, work on this model. Events are generated when something changes in the cluster infrastructure, such as a task failing, node failing, node joining, etc. Schedulers must react to these events, updating the finite state machine of the cluster objects and modifying the cluster state accordingly.
I’m trying to form a not too detailed mental for these differences to help understand the differences here - does Nomad still use this event-based, almost ‘callback’ like model, and Kubernetes use (for what want of a better term), ‘polling’ in this way?
If so, where ought I look to read a bit more about these?
I know about this page here, which explains the benefits of one over the other:
But the paper above outlines the tradeoffs made, and and what steps are taken to mitigate the downsides:
While event-driven schedulers are faster and more responsive in practice, guaranteeing correctness can be harder since the schedulers have no room to drop or miss the processing of an event. Dropping cluster events will result in the cluster not converging to the right state; jobs might not be in their expected state or have the right number of tasks running. Schedulers usually deal with such problems by making the agents or the source of the cluster event resend the event until they get an acknowledgment from the consumer that the events have persisted.
Thanks in advance!