chronos() problems with convergence -- sometimes getting stuck entirely

Question

chronos() problems with convergence -- sometimes getting stuck entirely

amizeranschi opened this issue 5 months ago · comments

Hello and many thanks for creating this useful package.

I am running ape::chronos() using the following code below and frequently see warning messages such as:

In nlminb(current.ages, f.ages, g.ages, control = list(eval.max = 1000,  ... :
  NA/NaN function evaluation

Also, the function often gets stuck after printing the following:

Setting initial dates...
Fitting in progress... get a first set of estimates

I am attaching my R environment for reproducibility. The code I am running is listed below and the ape package version is 5.7-1.8. Any advice for getting chronos() to converge with this input tree would be much appreciated.

test-chronos.RData.txt

options(warn = 1)
iter = 1
calib_start_time = Sys.time()
repeat
{
  iter_start_time = Sys.time()
  chronogram_no_outgroup = ape::chronos(rooted_tree_no_outgroup, calibration = ape::makeChronosCalib(rooted_tree_no_outgroup, node = calib_node_APU_ZV, age.min = 14, age.max = 16), model = calib_method, control = ape::chronos.control(nb.rate.cat = 5, dual.iter.max = 200, iter.max = 1e5, eval.max = 1e5, epsilon = 1e-4))
  if(attr(chronogram_no_outgroup, "convergence"))
  {
    calib_end_time = Sys.time()
    calib_runtime = round(difftime(calib_end_time, calib_start_time, units = "secs"))
    message(paste0("Time calibration convergenced successfully after ", iter, " interations. Total runtime in seconds: ", calib_runtime))
    break
  } else
  {
    iter_end_time = Sys.time()
    iter_runtime = round(difftime(iter_end_time, iter_start_time, units = "secs"))
    message(paste0("Time calibration didn't convergence in attempt number ", iter, ". Runtime for last iteration in seconds: ", iter_runtime))
    iter = iter + 1
  }
}

Emmanuel Paradis · Answer 1 · Mon Feb 26 2024 13:34:10 GMT+0800 (China Standard Time)

Hello,

I had a quick look at your data:

R> is.ultrametric(rooted_tree_no_outgroup)
[1] TRUE

Is this expected? You are trying to date a tree already ultrametric, but I don't think that's a problem. Rather:

R> table(rooted_tree_no_outgroup$edge.length == 0)

FALSE  TRUE 
  578   440

This is quite a lot branches with length zero. And they seem to be all terminal ones:

R> table(rooted_tree_no_outgroup$edge.length == 0, rooted_tree_no_outgroup$edge[, 2] <= 692)
       
        FALSE TRUE
  FALSE   326  252
  TRUE      0  440

You might try to drop some of the tips of your tree before running chronos.

Best,
E.

amizeranschi · Answer 2 · Tue Feb 27 2024 16:35:47 GMT+0800 (China Standard Time)

Hi @emmanuelparadis

Thanks a lot for your reply. Yes, we're originally creating an ultrametric tree via phangorn::pml_bb, then the goal is to calibrate the branch lengths with chronos.

Thanks for pointing out that the original tree contained a lot of redundant tips. After removing identical sequences (via pegas::haplotype), the number of tips dropped from 692 to 274 and the resulting tree has 273 internal nodes.

Unfortunately, even after this significant reduction in complexity, chronos still doesn't converge. Setting values too small for iter.max and eval.max results in the function stopping because of maximum evaluation limit. Setting epsilon too large results in false convergence, while setting it smaller makes chronos run more iterations before eventually stopping due to maximum evaluation limit. I am using model = "clock" because this makes the most sense for our data, so if I'm interpreting it correctly, I assume that changing lambda in chronos doesn't have any effect in this scenario.

I was thinking to try some kind of dumb optimization of hyperparameters like iter.max, eval.max and epsilon, by running batches of 10 or 20 chronos iterations and slightly changing those parameters according to the most frequent occurrence of attr(chronogram_no_outgroup, "message") in those tests. For example, if false convergence appears most frequently, then multiply epsilon by a factor of 2 or 5.

However, the main issue I'm running into is that chronos often gets stuck right at the beginning, after printing:

Setting initial dates...
Fitting in progress... get a first set of estimates

This happened in a recent test using the loop from the previous message, which got stuck after 23 iterations. Even after several hours, chronos wasn't showing any sign of progress. Is there any way to automate the handling of this kind of occurrence and tryCatch() or simply break out of it somehow?

I'm attaching an updated RData with the pruned tree. Is there anything else about rooted_tree_no_outgroup that may cause chronos to fail to converge?

test-chronos-pruned.RData.txt

amizeranschi · Answer 3 · Mon Mar 04 2024 13:22:42 GMT+0800 (China Standard Time)

Hi @emmanuelparadis

Using the test-chronos-pruned.RData.txt environment that I recently posted, the following loop reproduces the problem (chronos getting stuck with get a first set of estimates) very consistently for me.

Please let me know if you can reproduce this as well and what you think might be causing it.

iter = 1
calib_start_time = Sys.time()
repeat
{
  iter_start_time = Sys.time()
  chronogram_no_outgroup = ape::chronos(rooted_tree_no_outgroup, calibration = ape::makeChronosCalib(rooted_tree_no_outgroup, node = calibration_node, age.min = 14, age.max = 16), model = calib_method, 
                                        control = ape::chronos.control(dual.iter.max = 200, iter.max = 1e6, eval.max = 1e6, epsilon = 1e-2))
  if(attr(chronogram_no_outgroup, "convergence"))
  {
    calib_end_time = Sys.time()
    calib_runtime = round(difftime(calib_end_time, calib_start_time, units = "secs"))
    message(paste0("Time calibration converged successfully after ", iter, " iterations. Total runtime in seconds: ", calib_runtime))
    break
  } else
  {
    iter_end_time = Sys.time()
    iter_runtime = round(difftime(iter_end_time, iter_start_time, units = "secs"))
    message(paste0("Time calibration didn't converge in attempt number ", iter, ". Runtime for last iteration in seconds: ", iter_runtime))
    iter = iter + 1
    next
  }
}

Emmanuel Paradis · Answer 4 · Fri Mar 08 2024 11:17:45 GMT+0800 (China Standard Time)

Hi @amizeranschi,

I had to make a small modif to your above script (saved in "test_chronos.R"):

R> load("test-chronos.RData")
R> source("test_chronos.R")
Erreur dans eval(ei, envir) : objet 'calibration_node' introuvable
R> calibration_node <- calib_node_AUP

After, it run but did not converge:

R> source("test_chronos.R")

Setting initial dates...
Fitting in progress... get a first set of estimates
         (Penalised) log-lik = -10.32234 
Optimising rates... dates... -10.32234 
Optimising rates... dates... -10.2714 
Optimising rates... dates... -10.20227 
Optimising rates... dates... -10.13906 
Optimising rates... dates... -10.05171 
Optimising rates... dates... -9.969221 
Optimising rates... dates... -9.9415 
Optimising rates... dates... -9.834385 

log-Lik = -9.832885 
PHIIC = 675.67 
Time calibration didn't converge in attempt number 1. Runtime for last iteration in seconds: 60

Setting initial dates...
...[etc]...

I stopped it after 20 iterations. Maybe you can add a list to your script to save each chronogram (and maybe an option to break after a fixed number of repeats). The last one had:

R> attr(chronogram_no_outgroup, "message")
[1] "false convergence (8)"

which might better than nothing at all. Note that this diagnostic is from nlminb(); chronos() has different diagnostics for convergence.

As for why it's stuck for you, I'm not sure. I don't remember that we made changes to chronos recently. Just in case you can install the very last version from the r-universe:

https://emmanuelparadis.r-universe.dev/ape#

Cheers,

Emmanuel

amizeranschi · Answer 5 · Mon Mar 11 2024 23:01:10 GMT+0800 (China Standard Time)

Thank you for the suggestions. After double checking the default value of the tol parameter and considering that our calibration dates (14 - 16) were expressed in millions of years, with a substitution rate on the order of 0.001 per site per million years, we've set the value of tol to 1e-4 and this considerably improved the convergence of chronos, in the sense that it doesn't seem to get stuck anymore during the first iteration.

We will set up a loop of 10 or 20 chronos runs and retain the chronogram with the maximum log-likelihood for the rest of the analysis.