ms609 / Quartet

R package to calculate the similarity of two trees based on the number of shared four-taxon subtrees (or splits)

Home Page:https://ms609.github.io/Quartet/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unifurcating root causes ` QuartetDistance`, `TripletDistance` error

mmore500 opened this issue · comments

Trees with unifurcating roots cause QuartetDistance and TripletDistance to abort with an error. Here's a minimum working example of the issue

> library('Quartet')
Loading required package: TreeTools
Loading required package: ape
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Quartet_1.2.5   TreeTools_1.9.1 ape_5.7-1      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10       magrittr_2.0.3    bit_4.0.5         viridisLite_0.4.1
 [5] xtable_1.8-4      colorspace_2.1-0  R.cache_0.16.0    lattice_0.20-45  
 [9] R6_2.5.1          rlang_1.1.0       fastmap_1.1.1     fastmatch_1.1-3  
[13] Ternary_2.1.3     tools_4.1.2       rbibutils_2.2.13  parallel_4.1.2   
[17] grid_4.1.2        nlme_3.1-162      R.oo_1.25.0       cli_3.6.1        
[21] ellipsis_0.3.2    htmltools_0.5.5   bit64_4.0.5       digest_0.6.31    
[25] lifecycle_1.0.3   shiny_1.7.4       later_1.3.0       promises_1.2.0.1 
[29] R.utils_2.12.2    Rdpack_2.4        mime_0.12         sp_1.6-0         
[33] compiler_4.1.2    R.methodsS3_1.8.2 httpuv_1.6.9     
> QuartetDistance("newick1.tree.txt", "newick2.tree.txt")
Error: Leaves don't agree: a tip in tree 2 didn't exist in first tree! Aborting.
> QuartetDistance("newick1.tree.txt", "newick1.tree.txt")
[1] 0
> QuartetDistance("newick2.tree.txt", "newick2.tree.txt")
[1] 0

newick1.tree.txt

((8,3),(5,6));

newick2.tree.txt

(((3,8),(5,6)));

Looks like this is an upstream issue with the tqdist library this package wraps, as I am able to reproduce it directly with tqdist 1.0.2. I've contacted the author of tqdist, who wrote back and is looking into the bug. Just wanted to leave this note here for now in case anyone else runs into this issue. I'll provide an update when I hear further from Christian.

In case it's useful, here are some more details about the bug I shared with Christian.

The bug can be reproduced from these two example files.

tree1.newick

(((3,8),((5,6))));

and

tree2.newick

((3,8),((5,6)));

tree1 and tree2 are identical, except tree1 has a tacked-on root unifurcation.

Comparing the trees gives
$ quartet_dist -v tree1.newick tree2.newick
Leaves doesn't agree! Aborting! ( didn't exist in second tree)
The two trees do not have the same set of leaves.
Aborting.

even though comparing each to itself works as it should

$ quartet_dist -v tree2.newick tree2.newick
4 1 0 0 1 1 0 0

$ quartet_dist -v tree1.newick tree1.newick
5 5 0 0 5 1 0 0

Interestingly, non-root unifurcations don't seem to cause any issues as far as I can tell. The issue isn't reproducible across all root unifurcations. For example, if tree2 is "(((3,(5)),6,8));" comparison with tree1 works fine. However, comparing "(((3,(5)),6,8));" to "((3,(5)),6,8);" has the bug. The same issue arises with triplet_dist, too.

Thanks for the report; do keep me posted and I'll propagate any updates to tqDist to the Quartet package.