`read.tree` seems to round branch lengths
ajrominger opened this issue · comments
Thanks so much for ape! I don't know where I'd be without it!
Recently I've been working ape::phylo
objects and msprime. I am running into an issue with branch lengths seeming to be rounded to 6 decimal places by ape::read.tree
. When greater than 6 decimal places are needed, is there any work-around to the default behavior of ape::read.tree
?
Here's an example:
y <- read.tree(text = '(((((t6:2.45112382300675335500272922218,t8:2.45112382300675335500272922218):9.79658819007364911612967262045,t3:12.2477120130804024711324018426):0.274339041983090226040076231584,t4:12.5220510550634926971724780742):18.933448771541357302794494899,t10:31.4554998266048499999669729732):121.547048353778492923993326258,(((t5:1.18211007899886055838578613475,t1:1.18211007899886055838578613475):2.39764699703194139601691858843,(t7:0.98689255753362914447279763408,t2:0.98689255753362914447279763408):2.5928645184971728099299070891):16.3437354032475887777309253579,t9:19.9234924792783907321336300811):133.07905570110494863911299035);')
# note the branch length for tip "t6" is 2.45112382300675335500272922218
# now look at that branch in the `edge.length` element:
y$edge.length[y$edge[, 2] == which(y$tip.label == 't6')]
## 2.451124
Thanks for any thoughts on this and thanks again for ape!
This seems to be just how your machine decides to display digits. I think the branch lengths are actually properly input but displayed depending on your options. You can change it using options(digits)
. For example:
options(digits = 20)
y$edge.length[y$edge[, 2] == which(y$tip.label == 't6')]
## [1] 2.451123823006753355
Thanks for the appreciation (always appreciated ;) )
read.tree
does not truncate digits: they are all read in the limits defined by R.
First, R prints a number of digits according to this option:
R> options("digits")
$digits
[1] 7
This can be overriden by for instance print(, digits = )
. Try print(y$edge.length, digits = 22)
.
Then, real numbers in R are 64-bit floating-point numbers, so you cannot get that many digits stored in a number (16 digits is the typical number):
R> identical(2.45112382300675335500272922218, 2.4511238230067533)
[1] TRUE
R> identical(2.45112382300675335500272922218, 2.451123823006753)
[1] FALSE
The digits after '33' are truncated because they cannot be stored in a 64-bit reals. Maybe msprime uses 128-bit reals (known as long double in C/C++)?
Thanks so much for the prompt reply and info! This all makes a lot of sense. We're digging deeper into the msprime side of things, but now the R side is fully transparent to us, which is a huge help!
Thanks again!