kayjan / bigtree

Tree Implementation and Methods for Python, integrated with list, dictionary, pandas and polars DataFrame.

Home Page:https://bigtree.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Join forces with anytree

lverweijen opened this issue · comments

Would it be an idea to join forces with anytree?
I see a lot of overlap between both projects and perhaps a merger can combine the best of both worlds.
Since both have a MIT license, it might not even be that difficult.

One difference is that bigtree depends on pandas, whereas anytree is python-pure.
Maybe pandas can be made optional or even dropped if there is no performance difference.

Projects:

@lverweijen Bigtree is really cool and maintained. Feature-wise it outshines anytree. At this point, it's a bit too late maybe as i see anytree as a subset of bigtree.

@Abdur-rahmaanJ bigtree has more features (I think), but anytree's code / design looks more fleshed out.
I was thinking of maybe adding to anytree what's missing there, but already present in bigtree.

Something I would like anytree to have is more options for importing / exporting to different formats.
Since the apis are similar enough, it shouldn't be that hard to port features back and forth, but I would hope for a collaboration.

edit: I originally wrote "anytree has more features". I meant "bigtree" has more. Although, they both have features lacking from the other.

@lverweijen Can you make a list of features / imports needed / todos?

  • x
  • y
  • z

Thanks ^^.

Here are some differences:

  • anytree has NodeMixin, bigtree has BaseNode. NodeMixin is slightly more flexible, because it doesn't require a name attribute.
  • anytree has SymlinkNode. It's like a shallow copy of a Node.
  • anytree has separator as class attribute. bigtree has sep on root. I prefer class attribute, because of O(1) lookup time.
  • anytree is python only, bigtree has a dependency on pandas. Having it as an optional dependency is preferable.
  • anytree has resolver. bigtree.finds_paths comes close, but doesn't support wildcards.
  • anytree has ZigZagGroupIter. Not sure if it's needed.
  • bigtree has type annotations in the codebase. These are missing from anytree.
  • bigtree has BinaryNode. Basically a node with at most 2 children.
  • bigtree has DAGNode. A node with multiple parents.
  • bigtree has import / export to / from list, dict, nested dict, dataframe and list. anytree only supports json and dicts.
  • bigtree has bulk modification functions (shift_nodes/copy_nodes). Not sure what benefits they have over modifying nodes directly.
  • bigtree has workflows. They seem a bit too specific to include in the library itself. Maybe include them in documentation/examples instead.

There are a few ways to continue from:

  1. Add one project to the other. So either anytree should consume bigtree or bigtree should consume anytree.
  2. Start a new project that is the successor to both, to which both source packages can contribute. About individual differences a discussion can be started.
  3. The projects should gradually grow towards each other, copying features until they are exactly the same and maybe merge over time.

Hello, thanks for your comprehensive comparison! To address your points,

  • BaseNode in bigtree does not require name attribute, which I would think it is similar to NodeMixin as it is easily extendable. Similarly, I don't see a need for SymlinkNode because in any case, users can just copy or extend Node for their usecase, unless I have understood the usage and purpose of SymlinkNode wrongly. Examples of how to extend Node can be found in the documentation.
  • separator/sep should be consistent for the whole tree, which should not be implemented as a class attribute for each node. If you do this on anytree, you will notice the issue,
from anytree import Node
a = Node("a")
b = Node("b", parent=a)
b.separator = "-"
b
# Node('-a-b', separator='-')
a
# Node('/a')
  • Making pandas an optional library has been raised as an issue previously, you can refer to the issue here.
  • Resolver is interesting! This can be a future enhancement 👍
  • I didn't see a need for ZigZagGroupIter, but this can be a possible future enhancement as well!

Moving forward, I'd be happy to continuously enhance and fix bigtree, do continue to raise issues as well. Thanks for your support on bigtree!

separator/sep should be consistent for the whole tree, which should not be implemented as a class attribute for each node. If you do this on anytree, you will notice the issue

You are right about that. If using a class attribute, it can perhaps be prevented by using type(self).separator instead.

If you use type(self).separator, you are using the separator of anytree.node.node.Node class which is always /. You solve the issue on separator discrepancy, but you sacrifice on customizability; users cannot choose their own separator since it is referencing the default Node class.

Given these concerns, I would advise against using class attribute and have the separator/sep synced and consistent yet customizable for the whole tree, which is why I chose to implement it as a class property referencing root node's sep.

The following enhancements has been made available on bigtree v0.10.0

  • find_relative_path: Similar to Resolver from anytree, able to find relative path with ./../* notations
  • zigzaggroup_iter: Similar to ZigZagGroupIter from anytree
  • zigzag_iter: Zig Zag iteration, not present in anytree

Do upgrade bigtree with pip install --upgrade bigtree to get the latest changes! 😄

I will be closing this ticket as well, if there are any new features, enhancements, or bugfix, do raise another issue.

Update: Pandas is now an optional library in v0.12.0!

Thanks for all your support and suggestions in making bigtree better 😄