kayjan / bigtree

Tree Implementation and Methods for Python, integrated with list, dictionary, pandas and polars DataFrame.

Home Page:https://bigtree.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dataframe_to_tree doesn't work with path_col not the first column in dataframe

dkadish opened this issue · comments

When attempting to use path_col in the dataframe_to_tree function the function fails if path_col is not the first column in the dataframe.

Example:

import bigtree as bt
print(bt.__version__)

'0.9.3'

d = pd.DataFrame([
    ['a/b/c', 'c/e/f'],
    ['a/c', 'c/g'],
    ['a', 'c/h']
],
    columns=['one','two'])

print(d)

tree = bt.dataframe_to_tree(d, 'one')
bt.print_tree(tree)

one two
0 a/b/c c/e/f
1 a/c c/g1
2 a c/h
a
├── b
│ └── c
└── c

tree = bt.dataframe_to_tree(d, 'two')
bt.print_tree(tree)

TreeError: Error: Path does not have same root node, expected c, received a. Check your input paths or verify that path separator sep is set correctly

# Move 'two' to first column
d = d.assign(three=d.one).drop(columns='one')

print(d)

tree = bt.dataframe_to_tree(d, 'two')
bt.print_tree(tree)

two three
0 c/e/f a/b/c
1 c/g a/c
2 c/h a
c
├── e
│ └── f
├── g
└── h

The issue arises here, where the code does not pass the path_col parameter to the add_dataframe_to_tree_by_path function.

https://github.com/kayjan/bigtree/blob/9631361535a45909bd2579deb708b24f1eeae062/bigtree/tree/construct.py#L793C1-L798

Hello, thanks for raising this issue and submitting the PR! This fix is implemented in v0.9.4, do upgrade bigtree for the latest changes.