icbi-lab / infercnvpy

Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy.

Home Page:https://infercnvpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pre-processing data?

mibo1996 opened this issue · comments

Hi,

I was wondering how pre-processing affects the data input for InferCNVpy.

I know that the data must be filtered for low-quality cells, normalized, and log-transformed before running InferCNVpy, but I was wondering if it matters if also computing highly variable genes, regressing out unwanted variables (e.g. total counts and percentage of mitochondrial genes), scaling, running PCA, computing neighbors, UMAP, clustering, and differential expression analysis on the data before running InferCNVpy affects the results?

I am mainly wondering if it matters where in the workflow InferCNV comes in, and if it does, where it should be? E.g. always before/after {these steps}, or if it doesn't matter.

Thank you

Hi @mibo1996,

that's a good questions and I can only answer it partly

highly variable genes

Infercny ignores highly variable gene annotation. Subsetting to highly variable genes would lead to significantly worse results as you loose a lot of information about genomic positions.

running PCA, computing neighbors, UMAP, clustering, and differential expression analysis

this should have no effect either as infercnvpy only uses information in adata.X which is left untouched by these functions.

regressing out unwanted variables (e.g. total counts and percentage of mitochondrial genes), scaling

This will have an effect, but I don't know to what extent. I would suspect that it mostly changes the scale of results but that it qualitatively is still highly similar. Let me know how it goes if you try it out!

This description of the computation steps should also help you understand what data is used and how:
https://icbi-lab.github.io/infercnvpy/infercnv.html#computation-steps

Cheers,
Gregor

Feel free to reopen if there are follow-up questions.