kayjan / bigtree

Tree Implementation and Methods for Python, integrated with list, dictionary, pandas and polars DataFrame.

Home Page:https://bigtree.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

v0.17.0 dataframe_to_tree_by_relation TypeError str and float

nextgenusfs opened this issue · comments

Describe the issue
A clear and concise description of what the bug is.

Environment
Describe your environment.

  • Platform: centos7
  • Python version: 3.10.14
  • bigtree version: 0.17.0

I'm seeing changes in the behavior of dataframe_to_tree_by_relation. This code was working in previous versions (ie v0.16.4 it works without error):

# rd is a pandas dataframe
>>> rd.shape
(412, 7)
>>> rd.columns
Index(['assembly', 'illumina', 'ont', 'parent', 'seqread', 'strain', 'talias'], dtype='object')
>>> root = dataframe_to_tree_by_relation(rd, parent_col="parent", child_col="strain", attribute_cols=["talias", "seqread", "assembly"])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/envs/vtools/lib/python3.10/site-packages/bigtree/tree/construct.py", line 947, in dataframe_to_tree_by_relation
    f"Unable to determine root node\nPossible root nodes: {sorted(root_names)}"
TypeError: '<' not supported between instances of 'str' and 'float'

I'm not sure which column of the DataFrame this is referring to?

Ah there was some changes and fixes with dataframe operations in the latest version. Is it possible to share what does rd contain to recreate the error?

Edit: Can you check if your child_col (strain) is of datatype string? It could be that some strain is str and some is float.

Hi, thanks for raising this issue! The fix is implemented in v0.17.1, do upgrade bigtree with the command pip install --upgrade bigtree.