apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine

Home Page:https://datafusion.apache.org/ballista

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Change the task assignment philosophy from executor first to task first

yahoNanJing opened this issue · comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently during the task assignment, the scheduler will first reserve executor slots and then assign schedulable tasks to them. This kind of task assignment process is not good for scheduling tasks with preferences. For example, for a task with scanning specific files, it may prefer some executor closing to the files to execute this task. Especially, this kind of preference is very useful when enabling the data source cache layer.

Describe the solution you'd like

Therefore, it's better to change the task assignment philosophy by choosing schedulable tasks first and then choosing their preferred executors for them.

Describe alternatives you've considered

Additional context