Optimize parquet chunk downloading strategy
rdettai opened this issue · comments
The parquet table downloads each column chunk individually. If a large proportion of the columns are used and there is a large number of row groups in the file, this implies many small downloads.
A strategy could be implemented to group the downloads of column chunks if
- they are close
- download parallelism is already high enough (having multiple downloads in parallel increases the total bandwidth)