khabbazian / l1ou

Detection of evolutionary shifts in Ornstein-Uhlenbeck models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error of phylolm_CR in convergence estimation

kfuku52 opened this issue · comments

Hi, I got an error in estimate_convergent_regimes(). This happens only when alpha parameters are optimized like estimate_convergent_regimes(..., fixed.alpha=TRUE). I would appreciate any comments or suggestions.

Error message:

Error in phylolm_CR(Y ~ preds - 1, phy = tr, model = "OUfixedRoot", sc = shift.configuration, : The starting value is not within the bounds of the parameter.
Traceback:

1. estimate_convergent_regimes(fit_ind, criterion = "AICc", method = "backward", 
 .     fixed.alpha = FALSE)
2. estimate_convergent_regimes_surface(model, opt = opt)
3. cmp_model_score_CR(tr, Y, regimes, model$alpha, opt = opt)
4. cmp_AICc_CR(tree, Y, conv.regimes = regimes, alpha = alpha, opt = opt)
5. phylolm_interface_CR(tree, matrix(Y[, i]), conv.regimes, alpha = alpha[[i]], 
 .     opt = opt)
6. phylolm_CR(Y ~ preds - 1, phy = tr, model = "OUfixedRoot", sc = shift.configuration, 
 .     cr = conv.regimes, starting.value = alpha, upper.bound = opt$alpha.upper.bound, 
 .     lower.bound = alpha/100)
7. stop("The starting value is not within the bounds of the parameter.")

Code to reproduce the error:

tree_text="(((((((Mus_musculus_ENSMUSG00000089661:12.89838925,Mus_musculus_ENSMUSG00000095538:12.89838925)n2:7.989028152,Rattus_norvegicus_ENSRNOG00000001499:20.8874174)n4:61.25338149,Oryctolagus_cuniculus_ENSOCUG00000022977:82.14079889)n6:7.68238853,((Homo_sapiens_ENSG00000268975:13.90337806,Homo_sapiens_ENSG00000261857:13.90337806)n9:15.53816876,Macaca_mulatta_ENSMMUG00000013632:29.44154682)n11:60.3816406)n12:6.63920175,(Sus_scrofa_ENSSSCG00000002997:61.96598852,Bos_taurus_ENSBTAG00000009078:61.96598852)n15:34.49640065)n16:255.2960941,Xenopus_tropicalis_ENSXETG00000002588:351.7584832)n18:723.0614697,(((((((Rattus_norvegicus_ENSRNOG00000028721:20.8874174,Mus_musculus_ENSMUSG00000027416:20.8874174)n21:61.25338149,Oryctolagus_cuniculus_ENSOCUG00000001424:82.14079889)n23:7.68238853,(Macaca_mulatta_ENSMMUG00000014494:29.44154682,Homo_sapiens_ENSG00000125879:29.44154682)n26:60.3816406)n27:6.63920175,((Sus_scrofa_ENSSSCG00000007082:61.96598852,Bos_taurus_ENSBTAG00000000516:61.96598852)n30:15.78896789,Canis_lupus_ENSCAFG00000030197:77.75495641)n32:18.70743276)n33:62.13519841,Monodelphis_domestica_ENSMODG00000005398:158.5975876)n35:153.3063338,Gallus_gallus_ENSGALG00000008732:311.9039214)n37:39.85456185,Xenopus_tropicalis_ENSXETG00000003548:351.7584832)n39:723.0614697)n40;"
tree = read.tree(text=tree_text)

trait_text="brain,heart,kidney,liver,ovary,testis
Mus_musculus_ENSMUSG00000089661,2.1769406578477,0.202036051693925,0.63488272196565,0.638510496834611,1.31250675784555,1.45913097497683
Mus_musculus_ENSMUSG00000095538,0.103377188952908,0.0392406207305018,0.341134950053078,0.267010753389883,1.20089284030784,0.0804824077175943
Rattus_norvegicus_ENSRNOG00000001499,1.21464073013422,0.367395910873769,0.52235098372207,0.286406906103611,2.01061005264727,2.54677844995076
Oryctolagus_cuniculus_ENSOCUG00000022977,1.50193008898029,0.419327680366938,0.37886749408802,0.495140302749664,6.93602946184919,2.18609877961886
Homo_sapiens_ENSG00000268975,0,0.266988677902243,0.0531581474038556,0.194942717189735,0,0.0411595822452399
Homo_sapiens_ENSG00000261857,0.738193035867579,1.12596946521021,0.571242331193359,0.647248115527252,1.91034890419106,1.41385417335533
Macaca_mulatta_ENSMMUG00000013632,1.58742742928768,1.18179700137321,1.51345681691155,0.969525697852313,1.82109626650635,2.23218724717014
Sus_scrofa_ENSSSCG00000002997,1.14184323490317,1.06198735908003,0.707240258247191,0.736592284442087,0.65577709045319,0.814151034319946
Bos_taurus_ENSBTAG00000009078,4.09085164303143,2.40058424595315,2.9874525565945,2.85228369236132,3.12337006791316,2.90337478427231
Xenopus_tropicalis_ENSXETG00000002588,0.862939940626679,0.360229223073727,0.327912837312163,0.912571974051781,0.16464914584949,0.570870724658469
Rattus_norvegicus_ENSRNOG00000028721,0.793865212543779,0.126495504481227,0.189564135427474,0.0205186446505775,0.404758033920202,0.622541958944324
Mus_musculus_ENSMUSG00000027416,0.0539642463275866,0.0133687235135755,0.0119717717558467,0.026844934044235,1.89059101437825,0
Oryctolagus_cuniculus_ENSOCUG00000001424,0,0.443929837825391,1.09544972717812,0,0,0.550209062958829
Macaca_mulatta_ENSMMUG00000014494,0.0960701959183422,0.0365536296958946,0.0460032798433054,0,0.0805454117819435,3.56218496133299
Homo_sapiens_ENSG00000125879,0,0,0.00201377894184635,0,0.0816434192670939,0.0899198236644605
Sus_scrofa_ENSSSCG00000007082,1.02235798411815,0.745458667732567,0.754893805042313,0.632903316261303,0.0960255936389955,0.452407969797526
Bos_taurus_ENSBTAG00000000516,3.4163923290383,0.723236705462928,0,0.226080404205593,2.80622903364123,3.21201931204057
Canis_lupus_ENSCAFG00000030197,0.586061750953316,0.202319615318614,0.112868933178583,0.031461327458019,0.00608259828675725,0
Monodelphis_domestica_ENSMODG00000005398,2.00670644992751,0.345354276333881,0.228500546652594,0.310639667267954,2.17171017530506,2.98230624273514
Gallus_gallus_ENSGALG00000008732,1.99335564029219,0.448788658290314,0.140094253253896,0.123957568649032,3.10671747036773,0.383263265231511
Xenopus_tropicalis_ENSXETG00000003548,3.55950533293704,0.774471483114139,0.561099010697649,0.0882024773033921,0.200575784207915,0.963046405029511"
trait_matrix = read.table(text=trait_text, sep=",")

adj_data = adjust_data(tree=tree, Y=trait_matrix, normalize = FALSE)
fit_ind = estimate_shift_configuration(tree=adj_data$tree, Y=adj_data$Y, criterion="AICc", root.model="OUrandomRoot", nCores=1, rescale=TRUE)

# no error, fixed.alpha=TRUE
fit_conv1 = estimate_convergent_regimes(fit_ind, criterion="AICc", method="backward", fixed.alpha=TRUE)

# error, fixed.alpha=FALSE
fit_conv2 = estimate_convergent_regimes(fit_ind, criterion="AICc", method="backward", fixed.alpha=FALSE)

Many thanks.
Kenji

Mohammad, I wonder if that comes from the default starting value for alpha, on line 1185 of R/shift_configuration.R:

s = ifelse(is.na(opt$alpha.starting.value), max(0.5, l), opt$alpha.starting.value)

Would replacing max(0.5, l) by max(0.5*u, l) on that line fix the error?

Why:

This line is in a function that is called when fit_ind is created, and it fixes the starting value for alpha in the options opt of the fitted object. These options are re-used in the function that shows the error.

By default, the starting value is NA, so on this line it is changed to max(0.5,l) where l is the lower bound (0 by default). But this value could very well be greater than the upper bound, u (set the line before). By default, u is defined as something that depends on the tree height. A starting value of 0.5 makes sense for a tree of height 1. But in general, a starting value of u/5 would be better adapted to the tree, given that u is.

I fixed the error. The error was from convergent_regions when it calls phylolm_CR (the modified phylolm function).

The problem was that some of the estimated alphas from estimate_shift were a little bit larger than the original alpha upper bound (I guess that was due to some numerical round up in the optimization step). In phylolm_CR, the starting value of alpha is the estimated alpha. That leads to to the error of starting value is larger than the upper bound.

I also added a warning message in estimate_shift_configuration to warn the user about that. I believe in that case one should increase the alpha.upper bound and redo the analysis.

The updated version worked well with the warning as you explained. I will check other datasets which run into the same error and will get back to you if there is a problem. Thank you very much!