HazyResearch / meerkat

Creative interactive views of any dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG?] Getting the column names returns a list and not a set

jaksicf opened this issue · comments

Heya!

Been using your library and wanted to compare if two DataPanels have the same column names. While doing so realized that columns() return a list of the column names, and not a set.

https://github.com/robustness-gym/meerkat/blob/e3b437d47809ef8e856a5f732ac1e11a1176ba1f/meerkat/datapanel.py#L151

In my case that was a problem as the two DataPanels had the same column names but in different order which caused comparison of columns() to fail. Tbh I did not expect that as I regarded the order of the columns as an implementation detail. As such wanted to ask, if that was intended or just a bug. And if a bug, if you want a pull request which changes that to return a set?

Hi! Good q. The order of the columns is important for visualization. Users sometimes change the order of columns so that the DataPanel is easier to read in a Jupyter Notebook. This is why columns returns a list, and not a set.

If comparison of columns like set(dp1.columns) == set(dp2.columns) isn't ideal for your use case, you can submit a PR for a method like DataPanel.column_equals(other: mk.DataPanel) that checks if the column names and types are the same between two DataPanels.