flock-lab / flock

Flock: A Low-Cost Streaming Query Engine on FaaS Platforms

Home Page:https://flock-lab.github.io/flock/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Refactor!: Distributed Query Processing

gangliao opened this issue · comments

  • I should partition the logical plan instead of the physical plan.
    • This is because the logical plan is really small and physical plan generation is extremely fast. #407
    • We can also put the logical plans in the cloud context.
q0 generates physical plan in 296 us
q1 generates physical plan in 236 us
q2 generates physical plan in 234 us
q3 generates physical plan in 908 us
q4 generates physical plan in 1720 us
q5 generates physical plan in 1808 us
q6 generates physical plan in 2401 us
q7 generates physical plan in 1263 us
q8 generates physical plan in 1350 us
q9 generates physical plan in 1860 us
q10 generates physical plan in 224 us
q11 generates physical plan in 598 us
q13 generates physical plan in 800 us
q12 generates physical plan in 622 us
  • Okay, we should keep the physical plan partition since Spark is also implemented in this way.

[x] Ballista simply split each repartition operator in the physical plan into two operators (ShuffleWriterExec and ShuffleReaderExec) for distributed processing.