Understanding the resulting table of the differential variability test for residual over-dispersion

Question

Understanding the resulting table of the differential variability test for residual over-dispersion

SaraMonts opened this issue 3 years ago · comments

Hello!

First of all, thank you for this useful tool! I have applied it on my data where I am comparing two conditions (control and treatment). I am having difficulties understanding some of the columns of the table obtained for the residual over-dispersion after doing the differential variability test with the function "BASiCS_TestDE". More specifically, I don't understand how are the values of the columns "ResDispOverall", "ResDispFC", "ProbDiffResDisp" and "MeanOverall". Can you provide a little bit more of information on this columns?

Thank you very much,
Sara

Alan O'Callaghan · Answer 1 · Tue Nov 23 2021 00:53:43 GMT+0800 (China Standard Time)

Hi!

ResDispOverall measures the average of the residual over-dispersion parameters between both conditions.
ResDispFC shouldn't be included - thanks for flagging this. I would ignore this
ProbDiffResDisp is the probability that ResDispDistance is greater than the threshold supplied to BASiCS_TestDE (the EpsilonR argument). It's calculated as a tail posterior probability. It measures the fraction of the posterior distribution of the difference in residual overdispersion between condition 1 and condition 2 that is greater than the given difference threshold.
MeanOverall is the average of the mean expression parameters between the two conditions.

Does that all make sense, or would you like more detail?

SaraMonts · Answer 2 · Tue Nov 23 2021 14:58:31 GMT+0800 (China Standard Time)

Hi Alan!

Thanks for the quick reply. Your explanation is very clear! Just one last small question if I may, are the results subsetted by the EFDR_R? If it is so, how is the EFDR calculated? I'm sorry for my low level in statistics...

Thank you,
Sara

Alan O'Callaghan · Answer 3 · Mon Dec 06 2021 05:38:16 GMT+0800 (China Standard Time)

Sorry missed your reply. No worries, there's quite a lot involved here.

The EFDR is calculated as per equation 16 in the 2nd BASiCS paper here: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0930-3

Alternatively the code is here:

.EFDR <- function(EviThreshold, Prob) {
  return(sum((1 - Prob) * I(Prob > EviThreshold)) / sum(I(Prob > EviThreshold)))
}

The EFDR for a given posterior probability threshold is equal to the sum over all genes that exceed the evidence threshold of one minus the posterior probability of differential expression, all divided by the total number of genes with posterior probabilities of differential expression greater than that evidence threshold.

Or put differently, the denominator is the number of genes that would be DE at the given threshold. The numerator is the sum of (1 - Prob) for those same genes.

Very much non-intuitive I grant you, but probably no more so than p-values!

SaraMonts · Answer 4 · Tue Dec 14 2021 19:40:04 GMT+0800 (China Standard Time)

Thanks a lot Alan for the explanation! I understand it better now.

Sara

Alan O'Callaghan · Answer 5 · Thu Jan 20 2022 17:02:31 GMT+0800 (China Standard Time)

Great! Do let me know if you've more questions :)