We are currently using Celery inside K8s, and are severely thinking about shifting to Nomad (plus other products from Hashicorp of course). Celery in a cluster has a pathology: when defining a bunch of different queues / workers, while some workers are fully busy, some others are often idle, consuming resources for nothing. This is an unfair and inefficient task scheduling.
Celery knows nothing about the cluster’s resource state, so how could it fairly and efficiently distribute tasks ? Moreover, Celery appears like another layer of queues (e.g. with RabbitMQ) and task distribution, inside a system which can already do this (and which controls the whole system, similarly to an OS kernel for a machine).
We are discovering the concepts inside Nomad (jobs, task groups, tasks, the different schedulers…). We are wondering if the batch scheduler would be a good fit to get rid of Celery, just for… scheduling tasks (each task in 1 task group in 1 job ?), given that our tasks are I/O bound, and that each task may dynamically have to run some other tasks (dynamic task chaining).
What do you think ?