Remote tables: filter without sort
backkem opened this issue · comments
I was wondering: does the datafusion_remote_tables
filter push-down not support sorting? It seems that using filters and limits in the absence of a sort order could lead to un-expected results.
I'd be happy to help address this if this is indeed the case.
Hey @backkem, that's a good question.
We abide by the TableProvider
API set out by DataFusion which doesn't take into account the ORDER BY
clause:
seafowl/datafusion_remote_tables/src/provider.rs
Lines 120 to 126 in fdd7c49
Sorting itself is handled by DataFusion further down the data processing pipeline (i.e. once the data has been fetched) by a plan node above the scanning node in the plan AST.
While in principle filtering and sorting are commutative, the limit doesn't commute with sorting. DataFusion handles this by carefully deciding when to push-down the limit down into the scan (hence why it's an Option<usize>
), though I forgot where exactly that occurs.
Thank you for the feedback. I'll try to find some time to look into the directions mentioned in apache/datafusion#7871.
Closing as this was answered.
FYI: We created datafusion-contrib/datafusion-federation to explore the full query federation use-case.