catavallejos / BASiCS

BASiCS: Bayesian Analysis of Single-Cell Sequencing Data. This is an unstable experimental version. Please see http://bioconductor.org/packages/BASiCS/ for the official release version

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Understanding the resulting table of the differential variability test for residual over-dispersion

SaraMonts opened this issue · comments

Hello!

First of all, thank you for this useful tool! I have applied it on my data where I am comparing two conditions (control and treatment). I am having difficulties understanding some of the columns of the table obtained for the residual over-dispersion after doing the differential variability test with the function "BASiCS_TestDE". More specifically, I don't understand how are the values of the columns "ResDispOverall", "ResDispFC", "ProbDiffResDisp" and "MeanOverall". Can you provide a little bit more of information on this columns?

Thank you very much,
Sara

Hi!

  • ResDispOverall measures the average of the residual over-dispersion parameters between both conditions.
  • ResDispFC shouldn't be included - thanks for flagging this. I would ignore this
  • ProbDiffResDisp is the probability that ResDispDistance is greater than the threshold supplied to BASiCS_TestDE (the EpsilonR argument). It's calculated as a tail posterior probability. It measures the fraction of the posterior distribution of the difference in residual overdispersion between condition 1 and condition 2 that is greater than the given difference threshold.
  • MeanOverall is the average of the mean expression parameters between the two conditions.

Does that all make sense, or would you like more detail?

Hi Alan!

Thanks for the quick reply. Your explanation is very clear! Just one last small question if I may, are the results subsetted by the EFDR_R? If it is so, how is the EFDR calculated? I'm sorry for my low level in statistics...

Thank you,
Sara

Sorry missed your reply. No worries, there's quite a lot involved here.

The EFDR is calculated as per equation 16 in the 2nd BASiCS paper here: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0930-3

Alternatively the code is here:

.EFDR <- function(EviThreshold, Prob) {
  return(sum((1 - Prob) * I(Prob > EviThreshold)) / sum(I(Prob > EviThreshold)))
}

The EFDR for a given posterior probability threshold is equal to the sum over all genes that exceed the evidence threshold of one minus the posterior probability of differential expression, all divided by the total number of genes with posterior probabilities of differential expression greater than that evidence threshold.

Or put differently, the denominator is the number of genes that would be DE at the given threshold. The numerator is the sum of (1 - Prob) for those same genes.

Very much non-intuitive I grant you, but probably no more so than p-values!

Thanks a lot Alan for the explanation! I understand it better now.

Sara

Great! Do let me know if you've more questions :)