icbi-lab / infercnvpy

Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy.

Home Page:https://infercnvpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using reference datasets

dariarom94 opened this issue · comments

Dear infercnvpy developer team!

I wanted to tell you a warm thank you for this tool! In my project I am characterising a cohort of solid tumors and I use infercnvpy to quantify the CNVs in my data.
I am happy to provide concrete feedback that you are interested in that would help you develop the package further.

I also have one technical question I would be happy to get feedback on from your experience. In my case I have a low percentage of non-tumor cells and from recommendations and the method description I realized that the optimal thing for me to do is to use a reference dataset (f.e. from GTeX). Nevertheless I am unsure how to combine the reference data with the experimental. Is this something you also thought about/considered adding to the pipeline? For now I've just merged datasets and normalized them together, but I am not sure it is the optimal way to do it.

Hi @dariarom94,

thanks for your interest in infercnvpy!

I am happy to provide concrete feedback that you are interested in that would help you develop the package further.

We are always happy about feedback, feel free to open an issue at any time!

I realized that the optimal thing for me to do is to use a reference dataset (f.e. from GTeX). Nevertheless I am unsure how to combine the reference data with the experimental. Is this something you also thought about/considered adding to the pipeline?

You can specify a 1-dimensional numpy array with reference gene expression (same order as adata.var_names) via the
reference parameter of tl.infercnv: https://icbi-lab.github.io/infercnvpy/generated/infercnvpy.tl.infercnv.html#infercnvpy.tl.infercnv

The documentation could definitely be better about this feature. However, I never tried it in practice and I'm unsure how well it actually works, due to different characteristics of bulk RNA-seq and sigle-cell RNA-seq data. If you try it out, please let us know how it goes!

Cheers,
Gregor