icbi-lab / infercnvpy

Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy.

Home Page:https://infercnvpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Identifying spike-in cells using InferCNVpy

mibo1996 opened this issue · comments

Hi,

I have been trying to develop a workflow using InferCNVpy to identify spike-in cells in my scRNA-seq data. The spike-in cells were GFP+ 293T cells, which should have a higher CNV than the tumor sample cells that they were added to (a low-grade glioma, which has very few mutations).

When I run InferCNVpy after pre-processing the data, the high-CNV cells seem to separate from the other cells in the sample. However, when I create a feature plot to visualize the expression of GFP in the cells, the GFP+ cells do not entirely overlap with the high-CNV cluster.

Screen Shot 2021-08-17 at 4 17 36 PM

Screen Shot 2021-08-17 at 4 17 42 PM

Is there a reason for this observed effect? The high-CNV cells should be the only cells expressing GFP.

Thank you

Hi,

is that UMAP plot based on transcriptomics or on CNV data?

Could you try plotting the number of detected genes per cell? It could be that invercnv doesn't work properly on cells with only few detected genes or low number of total counts.

Hi @grst ,

The top UMAP plot is based on CNV data and the bottom UMAP is based on transcriptomic data.

Screen Shot 2021-08-18 at 12 07 29 PM

Screen Shot 2021-08-18 at 12 14 49 PM

Here is the plot of the number of genes detected per cell and total counts as well

Thank you

It really seems that the CNV profile of cells with low counts is more similar to other cells than to your GPF+ cells with high counts...

I never tried this myself, but maybe you could give sc.pp.regress_out a try before running tl.infercnv?

P.S. I don't know if you have read this already, but I'd recommend you to have a look at the description of the computation steps of infercnv. The method itself is rather simple, and the description can help you understand what can go wrong. UMAP/leiden then just builds the nearest neighbor graph on the result of the algorithm.

Hi @mibo1996,

we just added a wrapper for the copykat R package which
serves as a drop-in replacement for tl.infercnv. Some results we obtained on internal data suggest that the copykat algorithm
might no suffer so much from the differences in gene counts, so you could give that a try.

The tl.copykat is available in v0.2.0 and requires an an R installation with the copykat package.

Cheers,
Gregor