Duplicate node with same path and allow_duplicates
clabnet opened this issue · comments
Describe the issue
The dataframe_to_tree_by_relation
throw an error when using a large set of data with node with same path and allow_duplicates
= true
.
Environment
Describe your environment.
- Platform: Windows 11
- Python version: 3.11.5
bigtree
version: 0.12.4
To Reproduce
from bigtree import dataframe_to_tree_by_relation, tree_to_dot, tree_to_pillow, tree_to_dataframe
root = dataframe_to_tree_by_relation(df, child_col="item", parent_col="parent", allow_duplicates = True)
root.show(attr_list=["parent"])
tree_to_dataframe(
root,
name_col="item",
parent_col="parent",
path_col="path",
)
Steps or code to reproduce the behaviour :
df
is a BOM (bill of materials) of 6525 rows. I used pastebin to send you my dataset. To avoid limitation of Pastebin site, I had to split the dataset onto two files,
Expected behaviour
A tree expanded with 6525 rows
Screenshots
---------------------------------------------------------------------------
TreeError Traceback (most recent call last)
Cell In[8], line 3
1 from bigtree import dataframe_to_tree_by_relation, tree_to_dot, tree_to_pillow, tree_to_dataframe
----> 3 root = dataframe_to_tree_by_relation(df, child_col="item", parent_col="parent", allow_duplicates = True)
5 root.show(attr_list=["parent"])
7 tree_to_dataframe(
8 root,
9 name_col="item",
10 parent_col="parent",
11 path_col="path",
12 )
File /opt/conda/lib/python3.11/site-packages/bigtree/tree/construct.py:978, in dataframe_to_tree_by_relation(data, child_col, parent_col, attribute_cols, allow_duplicates, node_type)
976 row = list(root_row.to_dict(orient="index").values())[0]
977 root_node.set_attrs(retrieve_attr(row))
--> 978 recursive_create_child(root_node)
979 return root_node
File /opt/conda/lib/python3.11/site-packages/bigtree/tree/construct.py:972, in dataframe_to_tree_by_relation.<locals>.recursive_create_child(parent_node)
970 child_node = node_type(**retrieve_attr(row))
971 child_node.parent = parent_node
--> 972 recursive_create_child(child_node)
File /opt/conda/lib/python3.11/site-packages/bigtree/tree/construct.py:972, in dataframe_to_tree_by_relation.<locals>.recursive_create_child(parent_node)
970 child_node = node_type(**retrieve_attr(row))
971 child_node.parent = parent_node
--> 972 recursive_create_child(child_node)
File /opt/conda/lib/python3.11/site-packages/bigtree/tree/construct.py:972, in dataframe_to_tree_by_relation.<locals>.recursive_create_child(parent_node)
970 child_node = node_type(**retrieve_attr(row))
971 child_node.parent = parent_node
--> 972 recursive_create_child(child_node)
File /opt/conda/lib/python3.11/site-packages/bigtree/tree/construct.py:971, in dataframe_to_tree_by_relation.<locals>.recursive_create_child(parent_node)
969 for row in child_rows.to_dict(orient="index").values():
970 child_node = node_type(**retrieve_attr(row))
--> 971 child_node.parent = parent_node
972 recursive_create_child(child_node)
File /opt/conda/lib/python3.11/site-packages/bigtree/node/basenode.py:188, in BaseNode.parent(self, new_parent)
185 current_child_idx = None
187 # Assign new parent - rollback if error
--> 188 self.__pre_assign_parent(new_parent)
189 try:
190 # Remove self from old parent
191 if current_parent is not None:
File /opt/conda/lib/python3.11/site-packages/bigtree/node/node.py:169, in Node._BaseNode__pre_assign_parent(self, new_parent)
164 if new_parent is not None:
165 if any(
166 child.node_name == self.node_name and child is not self
167 for child in new_parent.children
168 ):
--> 169 raise TreeError(
170 f"Duplicate node with same path\n"
171 f"There exist a node with same path {new_parent.path_name}{new_parent.sep}{self.node_name}"
172 )
TreeError: Duplicate node with same path
There exist a node with same path /H-FUQF/FUQF.ALB.22.100/2999-12353-01-/2922-04964-01-/2922P04964-01-
Additional context
Please be patient with me. Thank's
Hi, thanks for your question.
The parameter allow_duplicates
for tree creation using parent-child relation is for allowing "duplicated child" in a sense that the child can be tagged to multiple parents.
For example,
import pandas as pd
from bigtree import dataframe_to_tree_by_relation
relation_data = pd.DataFrame([
["a", None], # root a
["b", "a"], # a/b
["c", "a"], # a/c
["b", "c"], # a/c/b - note that b now exist in two locations a/b and a/c/b
["d", "b"], # d is child of b - but which b?
])
# Running the following code with allow_duplicates=False will throw error
root = dataframe_to_tree_by_relation(relation_data, allow_duplicates=True)
root.show()
"""
a
├── b
│ └── d
└── c
└── b
└── d
"""
From above, the parameter allow_duplicates
allow Node d
to be tagged to multiple parent Node b
(from a/b
and a/c/b
).
For your issue, the error is due to the node already existing i.e., if a/b/d
is created, we cannot add another Node d
under Node b
and this has nothing to do with allow_duplicates
parameter. From your data, it seems like you have duplicated parent-child relation which results in the same child node being created again that is tagged to the same parent node. You can deduplicate your data and it should work.
import pandas as pd
from bigtree import dataframe_to_tree_by_relation
df = pd.read_csv("sample.csv")
df = df.drop_duplicates(subset=["item", "parent"])
root = dataframe_to_tree_by_relation(df, child_col="item", parent_col="parent", allow_duplicates=True)