Segfault on typical dataset (fasttree is fine)
GabeAl opened this issue · comments
time OMP_NUM_THREADS=16 ./FastTreeMP -gamma -nt -gtr -out ft.tre -log ft.log veryfasttree-3.2.0/full_fasttree.msa
(works)
time ./VeryFastTree -gamma -nt -gtr -out vft.tre -log vft.log -double-precision -threads 16 full_fasttree.msa
VeryFastTree Version 3.2.0 (OpenMP, AVX2) with AVX2 using threads(16) level 1 deterministic
Alignment: full_fasttree.msa
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Generalized Time-Reversible, CAT approximation with 20 rate categories
Read 2182 sequences, 43799 positions
Ignored unknown character D (seen 1 times)ences
Ignored unknown character K (seen 6 times)
Ignored unknown character M (seen 7 times)
Ignored unknown character R (seen 6 times)
Ignored unknown character S (seen 3 times)
Ignored unknown character W (seen 3 times)
Ignored unknown character X (seen 7429 times)
Ignored unknown character Y (seen 20 times)
Segmentation fault (core dumped)0 of 2179 01 of 2182 seqs
real 0m18.116s
user 2m12.019s
sys 0m2.617s
Here's the run that works perfectly with the same file with FastTree:
time OMP_NUM_THREADS=16 ./FastTreeMP -gamma -nt -gtr -out ft.tre -log ft.log veryfasttree-3.2.0/full_fasttree.msa
FastTree Version 2.1.11 Double precision (No SSE3), OpenMP (16 threads)
Alignment: veryfasttree-3.2.0/full_fasttree.msa
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Generalized Time-Reversible, CAT approximation with 20 rate categories
Ignored unknown character D (seen 1 times)ences
Ignored unknown character K (seen 6 times)
Ignored unknown character M (seen 7 times)
Ignored unknown character R (seen 6 times)
Ignored unknown character S (seen 3 times)
Ignored unknown character W (seen 3 times)
Ignored unknown character X (seen 7432 times)
Ignored unknown character Y (seen 20 times)
Initial topology in 97.51 seconds of 2179 1 of 2182 seqs 2100)
Refining topology: 44 rounds ME-NNIs, 2 rounds ME-SPRs, 22 rounds ML-NNIs
Total branch-length 12.394 after 524.45 sec401 of 2180 splits, 0 changes ax delta 0.000)
ML-NNI round 1: LogLk = -2276056.662 NNIs 542 max delta 742.67 Time 644.77 (max delta 742.675)
GTR Frequencies: 0.2911 0.1816 0.2013 0.3260ep 12 of 12
GTR rates(ac ag at cg ct gt) 1.1381 2.5808 1.1716 0.7971 4.3736 1.0000
Switched to using 20 rate categories (CAT approximation)20 of 20
Rate categories were divided by 0.975 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
ML-NNI round 2: LogLk = -1996267.527 NNIs 393 max delta 207.57 Time 1150.80(max delta 207.565)
ML-NNI round 3: LogLk = -1995588.989 NNIs 246 max delta 216.92 Time 1221.96(max delta 216.916)
ML-NNI round 4: LogLk = -1995349.767 NNIs 199 max delta 58.39 Time 1275.52 (max delta 58.388)
ML-NNI round 5: LogLk = -1995316.009 NNIs 126 max delta 14.77 Time 1311.64(max delta 14.768)
ML-NNI round 6: LogLk = -1995311.201 NNIs 94 max delta 3.32 Time 1340.93 (max delta 3.324)
ML-NNI round 7: LogLk = -1995310.246 NNIs 68 max delta 0.00 Time 1362.39 (max delta 0.000)
Turning off heuristics for final round of ML NNIs (converged)
ML-NNI round 8: LogLk = -1995271.566 NNIs 136 max delta 15.65 Time 1452.30 (final)lta 15.648)
Optimize all lengths: LogLk = -1995269.296 Time 1497.71
Gamma(20) LogLk = -2038701.830 alpha = 0.699 rescaling lengths by 1.261
Total time: 1874.27 seconds Unique: 2182/2182 Bad splits: 0/2179
real 31m15.230s
user 67m9.726s
sys 2m41.664s
One immediate issue is it looks like it doesn't count characters the same way. FastTree reports 7432 X's while VeryFastTree reports 7429. Go figure! 😄
After some investigation, I have identified the issue as an initialization problem introduced in the last update.
Regarding the number of characters issue, it appears to be a minor synchronization issue with the threads. While this issue only affects the log, I am working on a fix.
As a temporary solution, you may try using an earlier version. This version will have the log problem, but the results should be correct.
I will keep you updated on the progress of the fixes for both issues. Thank you for your patience.
Solved in version 3.2.1