apache / iceberg-python

Apache PyIceberg

Home Page:https://py.iceberg.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Type error in naming a protected attribute in class PyArrowFile(OutputFile, InputFile), breaking readability from HDFS using PyArrow

SebastianoMeneghin opened this issue Β· comments

Apache Iceberg version

0.6.0 (latest release)

Please describe the bug 🐞

The same error is present also in 0.5.X and in main

In the file ../pyiceberg/io/pyarrow.py at line 183 is defined the class PyArrowFile.
It inherits from OutputFile and InputFile.

In line 205, you describe a protected attribute, that is never used in the following lines

line 205 _fs: FileSystem

However, in the following part of the code of that class, you often access to the protected attribute _filesystem which is however never specified, neither in the class PyArrowFile nor in its parents.
I think this is causing some issues, while trying to use PyArrowFileIO to access files (I am trying it with HDFS as storage and SQL Lite as catalog).

line 209
def __init__(self, location: str, path: str, fs: FileSystem, buffer_size: int = ONE_MEGABYTE):
self._filesystem = fs
self._path = path
self._buffer_size = buffer_size
super().__init__(location=location)

I see, so the issue here is that the __init__ function sets the self._filesystem variable

def __init__(self, location: str, path: str, fs: FileSystem, buffer_size: int = ONE_MEGABYTE):
self._filesystem = fs

but the class variable is called _fs
_fs: FileSystem

This is probably a typo, here's the original PR
33f06fb#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdR140-R145

I think its safe to rename the class variable from _fs to _filesystem

Should I fix the problem or will you?

It should be a 1 line change, would you like to contribute?

Sure, I can!
If it makes sense for you, I will haha

feel free to tag in the PR for review :)

Here you can find the PR!
#686

Thanks!