[Bug] org.apache.paimon.trino.fileio.TrinoFileIO#listStatus recursive list path when use hdfs
yohengyang opened this issue · comments
In this issuse #54, When use hdfs , the semantics of org.apache.paimon.trino.fileio.TrinoFileIO#listStatus have issues
I understand the semantics of the org.apache.paimon.trino.fileio.TrinoFileIO#listStatus
method to be listing the first-level directories and files under the current path.
However, when using HDFS, the implementation of
trinoFileSystem.listFiles
recursively traverses subdirectories, which can lead to unnecessary data scanning.https://github.com/trinodb/trino/blob/a9c5719705614b3849f2e1a22b2a545da125bd32/lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/HdfsFileSystem.java#L228-L246
A possible solution is to add a parameter to the
trinoFileSystem.listFiles
method to control the scanning depth or add an option to switch recursive/non-recursive modes.@leaves12138 @JingsongLi Could you please help look into this issue
@yohengyang Thanks for reporting this issue!
We will fix this in paimon side: apache/paimon#3205
And implement this method in paimon-trino.
@yohengyang This pull request may resolve this problem: #66
If you have any other issues long time no replied, you can contact me by DINGDING "1ec-yosuk4tft1"