skale-me / skale

High performance distributed data processing engine

Home Page:https://skale-me.github.io/skale

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

External modules in worker

auridevil opened this issue · comments

Now worker functions are executed in a eval context, and so it's not allowed to import/require external modules.

It's planned to implement a way to bind external requires to workers? What are the main problems?

This feature is needed to extend workers without having to modify the engine, so it is an important one. Up to now, worker dependencies are only managed internally.

The code to be executed in workers is first serialized by master (JSON.stringify), then transmitted to each worker for execution (JSON.parse then eval). A first possible solution, would be to embed dependencies within the serialized code, to make it self-contained, using something like browserify prior to send. The limitation of this approach is the restriction to pure js code (no native binary).

This would not change the actual model, and can be experimented without changing the engine.

Additionally, the API could also be changed to let user decide which dependencies are made available to workers prior to execution (at this moment, statically defined here. There would be something like sc.require('my-depend') prior to the transform and action calls. Note that is doesn't solve how the code is deployed to workers (previous npm install, and/or serialization as previously described). Both approaches are probably complementary.

In the same vein, the API could be extended to run a kind of prerequisite stage to perform 'npm install' and such in workers prior sc.require. Not my preferred option.

I'm doing some tries, and come back to this thread with real tests and code.

this issue is really a RFC. comments and suggestions welcome