kayjan / bigtree

Tree Implementation and Methods for Python, integrated with list, dictionary, pandas and polars DataFrame.

Home Page:https://bigtree.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicate node with same path and allow_duplicates

clabnet opened this issue · comments

Describe the issue
The dataframe_to_tree_by_relation throw an error when using a large set of data with node with same path and allow_duplicates = true.

Environment
Describe your environment.

  • Platform: Windows 11
  • Python version: 3.11.5
  • bigtree version: 0.12.4

To Reproduce

from bigtree import dataframe_to_tree_by_relation, tree_to_dot, tree_to_pillow, tree_to_dataframe
root = dataframe_to_tree_by_relation(df, child_col="item", parent_col="parent", allow_duplicates = True)
root.show(attr_list=["parent"])
tree_to_dataframe(
   root,
   name_col="item",
   parent_col="parent",
   path_col="path",
   )

Steps or code to reproduce the behaviour :
df is a BOM (bill of materials) of 6525 rows. I used pastebin to send you my dataset. To avoid limitation of Pastebin site, I had to split the dataset onto two files,

Expected behaviour
A tree expanded with 6525 rows

Screenshots

---------------------------------------------------------------------------
TreeError                                 Traceback (most recent call last)
Cell In[8], line 3
      1 from bigtree import dataframe_to_tree_by_relation, tree_to_dot, tree_to_pillow, tree_to_dataframe
----> 3 root = dataframe_to_tree_by_relation(df, child_col="item", parent_col="parent", allow_duplicates = True)
      5 root.show(attr_list=["parent"])
      7 tree_to_dataframe(
      8    root,
      9    name_col="item",
     10    parent_col="parent",
     11    path_col="path",
     12    )

File /opt/conda/lib/python3.11/site-packages/bigtree/tree/construct.py:978, in dataframe_to_tree_by_relation(data, child_col, parent_col, attribute_cols, allow_duplicates, node_type)
    976     row = list(root_row.to_dict(orient="index").values())[0]
    977     root_node.set_attrs(retrieve_attr(row))
--> 978 recursive_create_child(root_node)
    979 return root_node

File /opt/conda/lib/python3.11/site-packages/bigtree/tree/construct.py:972, in dataframe_to_tree_by_relation.<locals>.recursive_create_child(parent_node)
    970 child_node = node_type(**retrieve_attr(row))
    971 child_node.parent = parent_node
--> 972 recursive_create_child(child_node)

File /opt/conda/lib/python3.11/site-packages/bigtree/tree/construct.py:972, in dataframe_to_tree_by_relation.<locals>.recursive_create_child(parent_node)
    970 child_node = node_type(**retrieve_attr(row))
    971 child_node.parent = parent_node
--> 972 recursive_create_child(child_node)

File /opt/conda/lib/python3.11/site-packages/bigtree/tree/construct.py:972, in dataframe_to_tree_by_relation.<locals>.recursive_create_child(parent_node)
    970 child_node = node_type(**retrieve_attr(row))
    971 child_node.parent = parent_node
--> 972 recursive_create_child(child_node)

File /opt/conda/lib/python3.11/site-packages/bigtree/tree/construct.py:971, in dataframe_to_tree_by_relation.<locals>.recursive_create_child(parent_node)
    969 for row in child_rows.to_dict(orient="index").values():
    970     child_node = node_type(**retrieve_attr(row))
--> 971     child_node.parent = parent_node
    972     recursive_create_child(child_node)

File /opt/conda/lib/python3.11/site-packages/bigtree/node/basenode.py:188, in BaseNode.parent(self, new_parent)
    185 current_child_idx = None
    187 # Assign new parent - rollback if error
--> 188 self.__pre_assign_parent(new_parent)
    189 try:
    190     # Remove self from old parent
    191     if current_parent is not None:

File /opt/conda/lib/python3.11/site-packages/bigtree/node/node.py:169, in Node._BaseNode__pre_assign_parent(self, new_parent)
    164 if new_parent is not None:
    165     if any(
    166         child.node_name == self.node_name and child is not self
    167         for child in new_parent.children
    168     ):
--> 169         raise TreeError(
    170             f"Duplicate node with same path\n"
    171             f"There exist a node with same path {new_parent.path_name}{new_parent.sep}{self.node_name}"
    172         )

TreeError: Duplicate node with same path
There exist a node with same path /H-FUQF/FUQF.ALB.22.100/2999-12353-01-/2922-04964-01-/2922P04964-01-

Additional context
Please be patient with me. Thank's

Hi, thanks for your question.

The parameter allow_duplicates for tree creation using parent-child relation is for allowing "duplicated child" in a sense that the child can be tagged to multiple parents.

For example,

import pandas as pd
from bigtree import dataframe_to_tree_by_relation

relation_data = pd.DataFrame([
    ["a", None],  # root a
    ["b", "a"],   # a/b
    ["c", "a"],   # a/c
    ["b", "c"],   # a/c/b - note that b now exist in two locations a/b and a/c/b
    ["d", "b"],   # d is child of b - but which b?
])

# Running the following code with allow_duplicates=False will throw error
root = dataframe_to_tree_by_relation(relation_data, allow_duplicates=True)
root.show()
"""
a
├── b
│   └── d
└── c
    └── b
        └── d
"""

From above, the parameter allow_duplicates allow Node d to be tagged to multiple parent Node b (from a/b and a/c/b).

For your issue, the error is due to the node already existing i.e., if a/b/d is created, we cannot add another Node d under Node b and this has nothing to do with allow_duplicates parameter. From your data, it seems like you have duplicated parent-child relation which results in the same child node being created again that is tagged to the same parent node. You can deduplicate your data and it should work.

import pandas as pd
from bigtree import dataframe_to_tree_by_relation

df = pd.read_csv("sample.csv")
df = df.drop_duplicates(subset=["item", "parent"])
root = dataframe_to_tree_by_relation(df, child_col="item", parent_col="parent", allow_duplicates=True)