Use statistical tests to improve CNV calling

Question

Use statistical tests to improve CNV calling

grst opened this issue 3 years ago · comments

The original inferCNV implementation essentially computes the log-fold change in regions of ~50 genes between each cell and a "reference", usually the mean of normal samples.

I suspect that the performance could be improved by doing a statistical test for differential expression between these regions, i.e.

for each gene, estimate the dispersion across all cells.
for each cell and each window, perform a statistical test using the dispersion estimates if the window is significantly enriched or depleted compared to the reference.
store the log-fold change only if the difference is statistically significant, otherwise it will be a 0 entry in the sparse matrix.

This may be computationally challenging, but using

batchglm, or by
computing a running t statistic
I believe it could be feasible even on large datasets.