icbi-lab / infercnvpy

Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy.

Home Page:https://infercnvpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use statistical tests to improve CNV calling

grst opened this issue · comments

The original inferCNV implementation essentially computes the log-fold change in regions of ~50 genes between each cell and a "reference", usually the mean of normal samples.

I suspect that the performance could be improved by doing a statistical test for differential expression between these regions, i.e.

  • for each gene, estimate the dispersion across all cells.
  • for each cell and each window, perform a statistical test using the dispersion estimates if the window is significantly enriched or depleted compared to the reference.
  • store the log-fold change only if the difference is statistically significant, otherwise it will be a 0 entry in the sparse matrix.

This may be computationally challenging, but using

  • batchglm, or by
  • computing a running t statistic
    I believe it could be feasible even on large datasets.