Efficient column construction from tuple
wenleix opened this issue · comments
Column construction from list is optimized with native C++ code (for scalar types), e.g.
import torcharrow as ta
a = ta.Column([1, 2, 3])
This optimization is not done for tuple (so construction from tuple still has O(n^2)
behavior ):
import torcharrow as ta
a = ta.Column((1, 2, 3))
Both Pandas and PyArrow supports that, so a feature we do want to keep:
>>> import pandas as pd
>>> a = pd.Series((1, 2, 3))
>>> a
0 1
1 2
2 3
dtype: int64
This is actually quite useful since sometimes user may create the data from a list of tuple using zip
, e.g.
>>> a = [("a", 1), ("b", 2), ("c", 3)]
>>> list(zip(*a))
[('a', 'b', 'c'), (1, 2, 3)]
I guess the easiest way would be to convert Tuple to list in Python. Not sure the performance comparing with handle tuple in C++ directly.
pybind11 exposes a py::tuple
type on the C++ side, so this should probably be trivial for us to support in the same way we do for lists. I'll investigate.