kayjan / bigtree

Tree Implementation and Methods for Python, integrated with list, dictionary, pandas and polars DataFrame.

Home Page:https://bigtree.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Better way to export children

lgehr opened this issue Β· comments

Hey,

I have found that the current way to export the children of a node (when exporting a whole tree) is rather cumbersome and feels a little hacky.
I just checked my finding with tree_to_dataframe because that is what I am using but I think it will be the same on other export functions.

The current way to export the children of a node into a Dataframe is to explicitly set them in attr_dict.

df = tree_to_dataframe(root, attr_dict={"children": "children"})

This is:

  1. Not well documented (you have to check the source [or guess], to understand what the attribute name is)
  2. A pain to use if you want to have all_attrs=True because using attr_dict overwrites all_attrs.

My idea would be to add a children_col attribute to the export functions. This is what I do for my current project.

>>> root = Node("a", age=90)
>>> b = Node("b", age=65, parent=root)
>>> df = tree_to_dataframe(root, children_col="children")
>>> print(df)
   path name               children
0    /a    a  (Node(/a/b, age=65),)
1  /a/b    b                     ()

The only problem with this approach is that we now export internal structures.
Maybe we can do something like children.map(lambda c: c.node_name) before export.

Another idea would be to export children with all_attrs=True .
Currently all_attrs=True exports path, name but not parent, children.
Which is kind of inconsistent anyway. But maybe another issue.

I could write up a PR for this.
Let me know what you think and thank you for this project.

Hi, thanks for raising this up. There are a few concerns raised so let me try to address them separately.

  1. Regarding documentation
    The tree_to_dataframe method is documented here which shows how to use the attr_dict parameter. If you meant that it as it is hard to tell what attributes your tree have then you can either use tree.show(all_attrs=True) to print to console and view all attributes of your whole tree or tree.describe() to view all public and private attributes of your tree node. If you meant that the term attribute is not documented, it is a tree / coding terminology, and hopefully interpretable from the documentation with the example codes.
    Let me know if there are additional stuff the documentation should cover πŸ˜„

  2. Regarding all_attrs and attr_dict
    Actually all_attrs overrides attr_dict (not the other way round), which is the intended case. I would think that users will either want specific attributes (using attr_dict) or want all attributes (using all_attrs=True), and if users use both parameters at the same time, all attributes should override the specific attributes.

  3. Regarding children_col
    This is actually a good idea, and you rightly pointed out the problem of it exporting the internal structures. However, using children.map(lambda c: c.node_name) assumes that the name of children is what the users want, which might not be the case. A possible alternative is to define a CustomNode as such,

from bigtree import Node, clone_tree, list_to_tree, tree_to_dataframe
 
class CustomNode(Node):
    @property
    def children_names(self):
        return ",".join(child.node_name for child in self.children)
 
# Create a tree for demonstration
path_list = ["a/b", "a/c", "a/b/d", "a/b/e", "a/c/f", "a/b/e/g", "a/b/e/h"]
tree = list_to_tree(path_list)
tree.show()
# a
# β”œβ”€β”€ b
# β”‚   β”œβ”€β”€ d
# β”‚   └── e
# β”‚       β”œβ”€β”€ g
# β”‚       └── h
# └── c
#     └── f
 
# Clone your tree to CustomNode to get the property `children_names`
custom_tree = clone_tree(tree, CustomNode)
tree_to_dataframe(custom_tree, attr_dict={"children_names": "children"})
#        path name children
# 0        /a    a      b,c
# 1      /a/b    b      d,e
# 2    /a/b/d    d        
# 3    /a/b/e    e      g,h
# 4  /a/b/e/g    g        
# 5  /a/b/e/h    h        
# 6      /a/c    c        f
# 7    /a/c/f    f        
  1. Regarding exporting with all_attrs=True exports path and name, but not parent and children
    When exporting to DataFrame, path is exported with path_col parameter (compulsory), name is exported with name_col parameter (compulsory), and parent is exported with parent_col parameter (optional). Therefore exporting path, name, and parent columns is not related to all_attrs parameter.
    In the case for children, children is nested structure and when exported to DataFrame, it does not allow modifying/accessing the Nodes directly, therefore I don't find value in exporting children. If required, an alternative would be handling it like (3) above.

Hope this clarifies and works for you!

Hey!
Thanks for the detailed answer. I think you are right and that the custom node is the best way to handle my use case.
Also I missed that path_col and name_col have default values and as such do not care about all_attrs.