icbi-lab / infercnvpy

Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy.

Home Page:https://infercnvpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

infercnvpy.tl.infercnv result matrix is of incorrect dimensionality

sean-at-tessera opened this issue · comments

Report

The matrix added to my AnnData object by infercnvpy in anndata.obsm['X_cnv'] has an unexpected number of columns compared to the original input matrix (anndata.X); these values were 6671 and 6965 respectively.

I assumed that the matrix would be reduced to the number of windows over which CNV values were calculated. I used windowsize=100 and step=1, so I would have guessed I would have seen 6866 columns.

Is the anndata.obsm['X_cnv'] representing something other than the CNV scores for the windows? Also, is there a way to annotate the columns in this matrix with the corresponding chromosome?

Version information

No response

Hi @sean-at-tessera,

thanks for your questions.

The matrix added to my AnnData object by infercnvpy in anndata.obsm['X_cnv'] has an unexpected number of columns compared to the original input matrix (anndata.X); these values were 6671 and 6965 respectively.

I assumed that the matrix would be reduced to the number of windows over which CNV values were calculated. I used windowsize=100 and step=1, so I would have guessed I would have seen 6866 columns.

The running mean is computed using np.convolve with mode="same". That means for each chromosome the number of the columns in the CNV matrix is identical to the number of genes in that chromosome, independent of the window size (At the beginning and the end of the chromosome, the mean is computed on less genes than the window size).

The reason why you see the discrepancy is that X and Y chromosomes are excluded by default.

Also, is there a way to annotate the columns in this matrix with the corresponding chromosome?

There's a chr_pos field in adata.uns that gives you the starting index for each chromosome

>>> adata.uns['cnv']['chr_pos']
{'chr1': 0,
 'chr2': 508,
 'chr3': 894,
 'chr4': 1192,
 'chr5': 1434,
 'chr6': 1711,
 'chr7': 1999,
 'chr8': 2275,
 'chr9': 2501,
 'chr10': 2715,
 'chr11': 2932,
 'chr12': 3248,
 'chr13': 3534,
 'chr14': 3661,
 'chr15': 3857,
 'chr16': 4054,
 'chr17': 4292,
 'chr18': 4575,
 'chr19': 4690,
 'chr20': 4970,
 'chr21': 5106,
 'chr22': 5185}

Hope that helps!

Thank you @grst for your clear and helpful answer!