[BUG] Indexing into DataPanel changes custom column type
dhatcher8 opened this issue · comments
Bug Description
When indexing to get a subset of rows from a DataPanel with a complex custom column type, the type of that column is being changed to a ListColumn in the new subset DataPanel
To Reproduce
May be difficult to reproduce as it's only occurring for one custom column type that we have.
- Create complex custom column type (ours is a column where each cell is a time series with categorical values and subclasses mk.CellColumn)
- Create a DataPanel instance (dp) that has the above column and some data inside of it
- Index into the DataPanel (dp_subset = dp[0:1])
- The column type for that specific column in dp_subset has changed to a ListColumn
System Information
- OS: MacOS
Thanks for the issue. Depending on the implementation of cell.get
, this might be expected behavior.
Consider this example,
import torchaudio
class TimeSeriesCell(mk.AbstractCell):
def __init__(self, path):
self.path = path
def get(self):
return torchaudio.load(self.path)[0]
cell = TimeSeriesCell(path="/Users/sabrieyuboglu/data/datasets/yesno/waves_yesno/0_0_0_0_1_1_1_1.wav")
dp = mk.DataPanel({
"index": range(10),
"cell": [cell] * 10
})
When you index a cell column like dp[:5]
, this is a "materializing" index. This means that we will call cell.get()
on each cell in the column. In this case, get
loads the time series from disk and returns it as a torch tensor. Meerkat then infers what the new column type should be (in this case a torch TensorColumn
). In your case, I imagine you might be returning a python object that meerkat doesn't have a special column for, so it just defaults to a ListColumn
.
Now, if you'd like to index the datapanel without materializing the cells (i.e. keep the CellColumn a CellColumn), you can do this with a "lazy" index: dp.lz[:5]
Let us know if this makes sense in your context, and if not, you can share the CellColumn implementation and we can dive deeper
Awesome, for our current situation lazy indexing should fulfill our needs. Thank you for the insight!