Running ETL workflows with Nomad

acivitillo · May 29, 2020, 5:23am

Me and my team run ETL jobs via python on dask. I was wondering what would be the Nomad way of running an ETL job? I think we’d still need a Dark scheduler and Dask workers running on Nomad. I especially like the idea of submitting ETL workflows like so ‘workflow run workflow_name’ and defining ETL workflows with hcl text files.

acivitillo · June 1, 2020, 5:45am

Answering my own question. The nomad way would be using dispatch jobs. A dispatch job is a function, a job with a parameter. If you are doing ETL with python you can have 1 job template run xyz python script and then a dispatch job could call an api to say run wonderfuletl.py.

In this approach you’d have to parallelize python functions manually by splitting the work and sending it as various dispatch jobs over the api. Ultimately this is the top difference with Dask.

In the case of the Extract part of ETL Nomad could still be used and it might be simpler to use than Dask. Say you are extracting data from a RDBS table. You could split the data in various queries and send each query over to a nomad job with dispatch. This can be done in parallel (only limitation would be the RDBS, not nomad).

Topic		Replies	Views
Dispatch job via api endpoint with meta parameters Nomad	1	388	March 23, 2022
Nomad Python Batch Job Nomad	1	2699	December 15, 2020
Parsing nomad job file in python Nomad	1	504	March 19, 2021
Authorization to use Jobs API within running Nomad task Nomad	3	426	January 12, 2022
Migrating daily jobs to Nomad Nomad	4	1183	March 14, 2022

Running ETL workflows with Nomad

Related topics