Support executing query stages in execution engines other than DataFusion

Question

Support executing query stages in execution engines other than DataFusion

andygrove opened this issue a year ago · comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When the executor receives a task, it deserializes the physical plan, wraps it in a ShuffleWriterExec, and executes it with DataFusion.

I want the ability to override this behavior to execute the plan in execution engines other than DataFusion.

Describe the solution you'd like
In the executor, we call new_shuffle_writer to create the ShuffleWriterExec that wraps the plan to be executed. I am thinking about moving that method into a new ExecuionEngine trait and creating a DataFusionExecutor implementation of the trait that is used by default.

We can then add a field to ExecutorProcessConfig as follows:

execution_engine: Option<Arc<dyn ExecutionEngine>>

This will allow me to register custom execution engines from PyBallista, and execute distributed queries in Polars, Pandas, and cuDF.

Describe alternatives you've considered
None

Additional context
None

Andy Grove · Answer 1 · Sat Feb 25 2023 08:06:46 GMT+0800 (China Standard Time)

@Dandandan @thinkharderdev @yahoNanJing @avantgardnerio @jdye64 fyi - let me know if you have any opinions on this approach. I am going to build a prototype of this over the next week. I am sure the design will evolve as I try and implement this.

Jeremy Dyer · Answer 2 · Sat Feb 25 2023 08:09:40 GMT+0800 (China Standard Time)

I have been thinking about this a lot today. I have had numerous ideas and all seem to have fell flat as I tried to fully implement them. I like the general idea however and curious to see how it looks fully materialized. Interesting stuff for sure!

Andy Grove · Answer 3 · Sat Feb 25 2023 08:43:35 GMT+0800 (China Standard Time)

There needs to be a scheduler element to this as well so we can do the plan translation once rather than per task.

Ken, Wang · Answer 4 · Thu May 18 2023 10:25:10 GMT+0800 (China Standard Time)

@andygrove
I like this idea. For the other execution engines, do you have any proposal ?

Andy Grove · Answer 5 · Tue Jan 16 2024 23:58:01 GMT+0800 (China Standard Time)

I am closing this for now because I think it is too ambitious given the current level of development in the project.