xhochy / fletcher

Pandas ExtensionDType/Array backed by Apache Arrow

Home Page:https://fletcher.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pandas 1.3.0 DataFrame repr fails due to IndexError

samgd opened this issue · comments

Pandas 1.3.0 passes an Ellipsis object to __getitem__ when the user prints it. This causes an IndexError to be thrown as __getitem__ does not, and is not specified to, support Ellipsis indexing.

Note that this only happens when the array is "large" so it must be interacting with printoptions somewhere.

It displays okay on Pandas 1.2.5 (error not thrown).

Example:

import fletcher as fr
import numpy as np
import pandas as pd
import pyarrow as pa


array = pa.array(np.arange(1000))

# ok
print(array)

# ok
fl_array = fr.FletcherContinuousArray(array)
print(fl_array)

# ok
ser = pd.Series(fl_array)
print(ser)

# fails, IndexError
df = pd.DataFrame({"foo": fl_array})
print(df)

Version information:

$ python --version
Python 3.8.0
$ for lib in fletcher numpy pandas pyarrow; do python -c "import $lib; print('$lib', $lib.__version__)"; done
fletcher 0.7.2
numpy 1.20.3
pandas 1.3.0
pyarrow 4.0.1

Went to report on pandas GitHub also and discovered the issue reported there: pandas-dev/pandas#42430

This project has been archived as development has ceased around 2021.
With the support of Apache Arrow-backed extension arrays in pandas, the major goal of this project has been fulfilled.