apache / paimon-trino

Trino Connector for Apache Paimon.

Home Page:https://paimon.apache.org/

Repository from Github https://github.comapache/paimon-trinoRepository from Github https://github.comapache/paimon-trino

[Bug] org.apache.paimon.trino.fileio.TrinoFileIO#listStatus recursive list path when use hdfs

yohengyang opened this issue · comments

In this issuse #54, When use hdfs , the semantics of org.apache.paimon.trino.fileio.TrinoFileIO#listStatus have issues

I understand the semantics of the org.apache.paimon.trino.fileio.TrinoFileIO#listStatus method to be listing the first-level directories and files under the current path.

public FileStatus[] listStatus(Path path) throws IOException {
List<FileStatus> fileStatusList = new ArrayList<>();
Location location = Location.of(path.toString());
if (trinoFileSystem.directoryExists(location).orElse(false)) {
FileIterator fileIterator = trinoFileSystem.listFiles(location);
while (fileIterator.hasNext()) {
FileEntry fileEntry = fileIterator.next();
fileStatusList.add(
new TrinoFileStatus(
fileEntry.length(),
new Path(fileEntry.location().toString()),
fileEntry.lastModified().getEpochSecond()));
}
trinoFileSystem
.listDirectories(Location.of(path.toString()))
.forEach(
l ->
fileStatusList.add(
new TrinoDirectoryFileStatus(new Path(l.toString()))));
}
return fileStatusList.toArray(new FileStatus[0]);

However, when using HDFS, the implementation of trinoFileSystem.listFiles recursively traverses subdirectories, which can lead to unnecessary data scanning.
https://github.com/trinodb/trino/blob/a9c5719705614b3849f2e1a22b2a545da125bd32/lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/HdfsFileSystem.java#L228-L246
A possible solution is to add a parameter to the trinoFileSystem.listFiles method to control the scanning depth or add an option to switch recursive/non-recursive modes.

@leaves12138 @JingsongLi Could you please help look into this issue

@yohengyang Thanks for reporting this issue!

We will fix this in paimon side: apache/paimon#3205
And implement this method in paimon-trino.

@yohengyang This pull request may resolve this problem: #66

If you have any other issues long time no replied, you can contact me by DINGDING "1ec-yosuk4tft1"