cloudfuse-io / buzz-rust

Serverless query engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimize parquet chunk downloading strategy

rdettai opened this issue · comments

The parquet table downloads each column chunk individually. If a large proportion of the columns are used and there is a large number of row groups in the file, this implies many small downloads.

A strategy could be implemented to group the downloads of column chunks if

  • they are close
  • download parallelism is already high enough (having multiple downloads in parallel increases the total bandwidth)