How do nomad task driver plugins actually work?

I’ve written a custom plugin based on the skeleton example, and it all works fine.

However, I still don’t really understand how they actually work, what the lifecycle is, and I can’t really find any good description. I’ve tried reading the code, but so far I’ve failed to gain any real insight.

For example, it’s not clear to me how many plugins get instantiated. Do I end up with one plugin executable running for each task that is launched? It seems that multiple copies of my plugin are run, but I’m unsure exactly how they are managed and what’s responsible for what. If I knew that there was going to be a one to one relationship between task driver executables and tasks, there are things I could do in the plugin directly that otherwise I’m having to do in another wrapper executable (so task driver launches second executable that does some stuff and then launches the actual executable to run the task).

Is there a good description somewhere?

Thanks.

Hi @tomqwpl :wave:

That’s awesome! I would love to hear more about it if you don’t mind :grinning_face_with_smiling_eyes:

That’s a very good point, and we certainly need to improve our docs on this. Maybe this presentation can help you understand things better?

No. Nomad will start one plugin process in each client that has it installed. This process will then be responsible for managing all the tasks.

That’s odd :thinking:

Maybe Nomad was force-quit and didn’t have time to stop the plugin process, and so they accumulated over time? And are they actually your plugin process or maybe a child process that your plugin creates for each task?

Checkout the video I linked and see if it helps you. If your plugin is open source it would nice to have a link as well (if possible).

Thanks for the link to the video, very useful. It confirms, you have confirmed how I expected the plugins to work, that you have one instance of the plugin that manages all the tasks of that given type.
I’ll have to delve further into what I’m seeing. Perhaps I just got confused with the code, trying to follow through how the “exec” stuff works and how it tracks and reconnects to processes when necessary. There are also nomad log messages where it appears to be starting the plugin multiple times, but again, that could be misinterpreting things.

I’ll have another play around and see if I can gain any further insight.

Thanks

1 Like

I think my confusion actually comes from the “executor” framework, in that I think that this then starts up another copy of my plugin. At the moment I’m unclear what value this executor framework gives me if all I want to do is launch a local executor (not interested in containers and so on, just really want a raw golang os/exec Cmd interface. We’re wanting to make some changes to the way the processes are launched, and so far I can’t work out how it hangs together.

So I think my question is really about how a task driver plugin and and the executor framework used by the skeleton driver work together.

It looks like whenever the task driver launches an executable it does so by creating an “executor” plugin and this launches another copy of the plugin executable, and then the “executor” plugin ultimately launches the real “workload” executable… That instance of the plugin executable manages only one “workload” executable. If the task driver plugin has to be restarted, it reconnects to the “executor” plugins. I think I had originally envisaged that this reconnection would be to the “workload” executable, if you see what I mean.

At the moment I’m not understanding the purpose behind all of this and was hoping to find some design docs around it. At the moment I’m thinking it would be easier to do what we need to do by just using os/exec Cmd directly, but I’m sure there must be some reason why it’s done this way that I’m not seeing at the moment.

Any further suggestions on this?
I have just run a garbage collection on my nomad client. I have no jobs. Yet I have 9 copies of my task driver executable running.
This feels like it ought not to be the case to me.