pytorch / torcharrow

High performance model preprocessing library on PyTorch

Home Page:https://pytorch.org/torcharrow/beta/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Interface name (`IDataFrame/IColumn`) vs. factory method (`DataFrame/Column`)

wenleix opened this issue · comments

Current Status

In TorchArrow, the interface names are ta.IDataFrame/ta.IColumn while the factory methods are ta.DataFrame/ta.Column:

import torcharrow as ta
a = ta.Column([1, 2, 3])
assert isinstance(a, ta.IColumn)
assert isinstance(a, ta.velox_rt.numerical_column_cpu.NumericColumnCpu)

And we use IColumn/IDataFrame as the type hint in parameter, transformations, etc.

The cavity here is user might think ta.DataFrame / ta.Column as class name on first impression, and later found they have to use ta.IDataFrame / ta.IColumn.

Proposed Change

We want to use DataFrame/Column/NumericalColumn/StringColumn/ListColumn... as the interface name, and ta.dataframe/ta.column as factory method:

import torcharrow as ta
a = ta.column([1, 2, 3])
assert isinstance(a, ta.Column)

This is similar to PyArrow/PyTorch convention (pa.array as factory method, pa.Array as interface name):

import pyarrow as pa
a = pa.array([1, 2, 3])
assert isinstance(a, pa.Array)
assert isinstance(a, pa.IntegerArray)

and PyTorch :torch.tensor as the factory method, torch.Tensor as interface name:

import torch
a = torch.tensor([1, 2, 3])
assert isinstance(a, torch.Tensor)

Resolved by the following PRs: