Migrating from Mesos

I created an entry in reddit and was asked to do similar entry here. This is the original link: Reddit - Dive into anything

Hi, my team would like to adopt Nomad and replace mesos. We run mesos, marathon and DCOS. We have customised schedulers that help us run custom on demand jobs. The architecture of schedulers in mesos is pretty straight forward and easy.

I would like to know where to look for writing custom schedulers in Nomad. We write custom schedulers in python for mesos. We use mesos framework (pymesos) and while it is scheduling, it is running in scheduler mode. Our requirement is not batch scheduling, we would pass job details to scheduler and it will run them in the appropriate client server with enough resources. The job itself would be extended from mesos framework, it will be running in executor mode. The framework has abstracted the communication with mesos master, we just need to implement the start, stop, scale up, scale down methods in scheduler and handle stop signals from scheduler in the executor. Further it also helps us handle other cluster level functions like what do to when the client machine goes down, when the communication with the client machine is lost etc.

I believe Nomad will have its own way of doing things and we don’t expect things to be similar too. Can we do custom scheduling in Nomad partially like we do in mesos?

The reason for moving towards Nomad is the limitations of mesos community, most best practices are kept secret with big organisation. Further the security of community version is nil in DCOS. The management of nomad clusters seem very easy too. We would also use vault, consul, terraform in future, so nomad seems very logical to get our hands on.

Thanks for reading.

I would like to know how to write custom schedulers in Nomad.

1 Like

Hi @caring-nt :wave:

I am not familiar with how Mesos custom schedulers work, but Nomad doesn’t have the ability to externally customize the scheduler. It does provide two scheduling algorithms (binpacking and spread) and different type of workload scheduling (system, service and batch).

If you want a deeper customized scheduling logic, you would need to build a custom Nomad binary. You can read more details about Nomad scheduling here: Scheduling | Nomad | HashiCorp Developer.

The code for the scheduler lives in the scheduler package, and the main interface that needs to be implemented is the Scheduler.

But I would like to understand your use case better before you go down this rabbit hole :grinning_face_with_smiling_eyes:

This is what Nomad already does. You can define your task resource requirements and Nomad will find a client that is able to run it.

This part sounds like a Nomad task driver? Task drivers are what actually run workloads, so there’s the Docker task driver, the Java task driver etc. These can be developed as plugins in Nomad.

Could you provide more details of your requirements and, more specifically, if there was something that you haven’t been able to do with Nomad?

Thanks!

Thanks for prompt response.

That will be good enough. Mesos by default allows only Dominant Resource Fairness (DRF), we may have to implement other algorithms. We wanted default Bin Packing, so Nomad will be helpful.

So we have set of data in Database, our developers will write logic to transform and generate better knowledge from them. These are custom codes. The developer will initiate the code from a control panel. The control panel sends the job details and resource requirement to the scheduler. The scheduler schedules the task in a client server and runs it as an executor. The scheduler provides proper logging and other run related metric, which our control panel captures.

This part looks interesting. Will look into that. Meanwhile do we have any blog that creates a simple plugin in any programming language? It will make things easier.

Nice. Yeah, so you will get binpacking for free with Nomad.

This is exactly how Nomad works, so you shouldn’t need to do any extra work :slightly_smiling_face:

Unfortunately plugins are written in Go. In theory they could be written in any language that supports gRPC and protobufs, but we haven’t really test it.

Here’s a session about Nomad plugins: https://www.youtube.com/watch?v=7xa4paf4QzE

1 Like

Hi,
Sorry for late response.

So in our case we need to have control over this. The following is the use cases

  • Custom resource selection based on internal logic or dynamic signals.
  • Stats capture and record for change in state of task
    
  • Re-run failed tasks based on internal parameters
    
  • Start-Stop-Resume logic and Start-Fail-Restart logic customization.
    

The above set of cases are not exhaustive, but I believe you got the point.

Task drivers in Nomad afaik are per agent, this is akin to Executor in Mesos. But what I couldn’t find in Nomad is a mechanism to plugin to the central Scheduling to update its logic and fire signals on certain events.

A similar example in Kubernetes is like the following

They do not completely fit my needs but comes pretty close to my existing system.