Use statistical tests to improve CNV calling
grst opened this issue · comments
Gregor Sturm commented
The original inferCNV implementation essentially computes the log-fold change in regions of ~50 genes between each cell and a "reference", usually the mean of normal samples.
I suspect that the performance could be improved by doing a statistical test for differential expression between these regions, i.e.
- for each gene, estimate the dispersion across all cells.
- for each cell and each window, perform a statistical test using the dispersion estimates if the window is significantly enriched or depleted compared to the reference.
- store the log-fold change only if the difference is statistically significant, otherwise it will be a 0 entry in the sparse matrix.
This may be computationally challenging, but using
batchglm
, or by- computing a running t statistic
I believe it could be feasible even on large datasets.