Identifying spike-in cells using InferCNVpy

Question

Identifying spike-in cells using InferCNVpy

mibo1996 opened this issue 3 years ago · comments

mb1996 commented 3 years ago

Hi,

I have been trying to develop a workflow using InferCNVpy to identify spike-in cells in my scRNA-seq data. The spike-in cells were GFP+ 293T cells, which should have a higher CNV than the tumor sample cells that they were added to (a low-grade glioma, which has very few mutations).

When I run InferCNVpy after pre-processing the data, the high-CNV cells seem to separate from the other cells in the sample. However, when I create a feature plot to visualize the expression of GFP in the cells, the GFP+ cells do not entirely overlap with the high-CNV cluster.

Is there a reason for this observed effect? The high-CNV cells should be the only cells expressing GFP.

Thank you

Gregor Sturm · Answer 1 · Wed Aug 18 2021 15:24:03 GMT+0800 (China Standard Time)

Hi,

is that UMAP plot based on transcriptomics or on CNV data?

Could you try plotting the number of detected genes per cell? It could be that invercnv doesn't work properly on cells with only few detected genes or low number of total counts.

mb1996 · Answer 2 · Thu Aug 19 2021 00:08:31 GMT+0800 (China Standard Time)

Hi @grst ,

The top UMAP plot is based on CNV data and the bottom UMAP is based on transcriptomic data.

Here is the plot of the number of genes detected per cell and total counts as well

Thank you

Gregor Sturm · Answer 3 · Thu Aug 19 2021 14:42:26 GMT+0800 (China Standard Time)

It really seems that the CNV profile of cells with low counts is more similar to other cells than to your GPF+ cells with high counts...

I never tried this myself, but maybe you could give sc.pp.regress_out a try before running tl.infercnv?

P.S. I don't know if you have read this already, but I'd recommend you to have a look at the description of the computation steps of infercnv. The method itself is rather simple, and the description can help you understand what can go wrong. UMAP/leiden then just builds the nearest neighbor graph on the result of the algorithm.

Gregor Sturm · Answer 4 · Mon Sep 13 2021 20:55:29 GMT+0800 (China Standard Time)

Hi @mibo1996,

we just added a wrapper for the copykat R package which
serves as a drop-in replacement for tl.infercnv. Some results we obtained on internal data suggest that the copykat algorithm
might no suffer so much from the differences in gene counts, so you could give that a try.

The tl.copykat is available in v0.2.0 and requires an an R installation with the copykat package.

Cheers,
Gregor