apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine

Home Page:https://datafusion.apache.org/ballista

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Improvement] Increase the throughput in push-staged TaskSchedulingPolicy

Ted-Jiang opened this issue · comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Our team using push-staged TaskSchedulingPolicy as the OLAP query engine, we main focus on the quick respond query less than 10s hundreds degrees of concurrency.

Now:

Cluster: with 60 executor (each 200 slots) and 1 scheduler cluster.
WorkLoad: 200 clients submit 0.5s cost query in sequence.

We got 1200 query per mins 😢

So we found below issues in push-staged

  • Should wait for the launch_multi_task result asynchronously #759
  • Try avoid data lock racing #753
  • Avoid empty events and combine continuous events
  • etc

After fix these we got 16000 query per mins ! We will contribute these back soon.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Interesting, sounds like some promising improvements :)