Interface name (`IDataFrame/IColumn`) vs. factory method (`DataFrame/Column`)
wenleix opened this issue · comments
Current Status
In TorchArrow, the interface names are ta.IDataFrame/ta.IColumn
while the factory methods are ta.DataFrame
/ta.Column
:
import torcharrow as ta
a = ta.Column([1, 2, 3])
assert isinstance(a, ta.IColumn)
assert isinstance(a, ta.velox_rt.numerical_column_cpu.NumericColumnCpu)
And we use IColumn/IDataFrame
as the type hint in parameter, transformations, etc.
The cavity here is user might think ta.DataFrame
/ ta.Column
as class name on first impression, and later found they have to use ta.IDataFrame / ta.IColumn
.
Proposed Change
We want to use DataFrame/Column/NumericalColumn/StringColumn/ListColumn...
as the interface name, and ta.dataframe/ta.column
as factory method:
import torcharrow as ta
a = ta.column([1, 2, 3])
assert isinstance(a, ta.Column)
This is similar to PyArrow/PyTorch convention (pa.array
as factory method, pa.Array
as interface name):
import pyarrow as pa
a = pa.array([1, 2, 3])
assert isinstance(a, pa.Array)
assert isinstance(a, pa.IntegerArray)
and PyTorch :torch.tensor
as the factory method, torch.Tensor
as interface name:
import torch
a = torch.tensor([1, 2, 3])
assert isinstance(a, torch.Tensor)