Inconsistent use_rep usage through src files

Question

Inconsistent use_rep usage through src files

zvittorio opened this issue a year ago · comments

Report

Dear infercnvpy team,

First of all, thanks for making inferCNV scalable to big dataset and for the tool in general.
I have noticed that when using the chromosome_heatmap method, passing use_rep argument as default, infercnvpy runs the pca again, making the plotting quite time consuming.
I tried avoiding this by specifying 'X_cnv_pca' in use_rep but I get this error:

cnv.pl.chromosome_heatmap(adata[adata.obs["Annot"] == "NE",:], 
                          groupby="Study", 
                          use_rep= 'X_cnv_pca', 
                          dendrogram=True)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_3634936/1577166331.py in <module>
----> 1 cnv.pl.chromosome_heatmap(adata[adata.obs["Author_annot_unified"] == "NE",:], 
      2                           groupby="Study",
      3                           use_rep= 'X_cnv_pca',
      4                           dendrogram=True)

/lib/python3.9/site-packages/infercnvpy/pl/_chromosome_heatmap.py in chromosome_heatmap(adata, groupby, use_rep, cmap, figsize, show, save, **kwargs)
     56     if groupby == "cnv_leiden" and "cnv_leiden" not in adata.obs.columns:
     57         raise ValueError("'cnv_leiden' is not in `adata.obs`. Did you run `tl.leiden()`?")
---> 58     tmp_adata = AnnData(X=adata.obsm[f"X_{use_rep}"], obs=adata.obs)
     59 
     60     # transfer colors from adata if present

lib/python3.9/site-packages/anndata/_core/aligned_mapping.py in __getitem__(self, key)
    111     def __getitem__(self, key: str) -> V:
    112         return as_view(
--> 113             _subset(self.parent_mapping[key], self.subset_idx),
    114             ElementRef(self.parent, self.attrname, (key,)),
    115         )

/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py in __getitem__(self, key)
    146 
    147     def __getitem__(self, key: str) -> V:
--> 148         return self._data[key]
    149 
    150     def __setitem__(self, key: str, value: V):

KeyError: 'X_X_cnv_pca'

suggesting that I should have specified just cnv_pca because of X_{use_rep}. So I also tried:

cnv.pl.chromosome_heatmap(adata[adata.obs["Author_annot_unified"] == "NE",:], 
                          groupby="Study", 
                          use_rep= 'cnv_pca', 
                          dendrogram=True)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_3634936/3064392404.py in <module>
----> 1 cnv.pl.chromosome_heatmap(adata[adata.obs["Author_annot_unified"] == "NE",:], 
      2                           groupby="Study",
      3                           use_rep= 'cnv_pca',
      4                           dendrogram=True)

lib/python3.9/site-packages/infercnvpy/pl/_chromosome_heatmap.py in chromosome_heatmap(adata, groupby, use_rep, cmap, figsize, show, save, **kwargs)
     63 
     64     # re-sort, as saving & loading anndata destroys the order
---> 65     chr_pos_dict = dict(sorted(adata.uns[use_rep]["chr_pos"].items(), key=lambda x: x[1]))
     66     chr_pos = list(chr_pos_dict.values())
     67 

lib/python3.9/site-packages/anndata/compat/_overloaded_dict.py in __getitem__(self, key)
     98             return self.overloaded[key].get()
     99         else:
--> 100             return self.data[key]
    101 
    102     def __setitem__(self, key, value):

KeyError: 'cnv_pca'

with a slightly different error. Apparently in chromosome_heatmap.py the usage of the use_rep argument is again "as is" instead of using X_{use_rep}.
I am not working on my local machine, so I will not be able to change the line in the source file myself.
Thank you if you can apply this change or provide any clarification!

Vittorio

Version information

-----
anndata             0.7.8
infercnvpy          0.4.0
matplotlib          3.4.3
numpy               1.21.3
pandas              1.3.4
scanpy              1.8.2
scipy               1.7.1
session_info        1.0.0
-----
PIL                 8.3.2
backcall            0.2.0
beta_ufunc          NA
binom_ufunc         NA
bottleneck          1.3.2
cairo               1.20.1
cffi                1.14.6
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
decorator           5.0.9
defusedxml          0.7.1
gtfparse            1.2.1
h5py                3.6.0
igraph              0.9.8
ipykernel           6.0.3
ipython_genutils    0.2.0
ipywidgets          7.6.3
jedi                0.18.0
joblib              1.0.1
kiwisolver          1.3.2
leidenalg           0.8.8
llvmlite            0.37.0
matplotlib_inline   NA
mpl_toolkits        NA
natsort             8.0.2
nbinom_ufunc        NA
netifaces           0.11.0
numba               0.54.1
numexpr             2.7.3
packaging           20.9
parso               0.8.2
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
prompt_toolkit      3.0.19
psutil              5.8.0
ptyprocess          0.7.0
pycparser           2.20
pyexpat             NA
pygments            2.9.0
pyparsing           2.4.7
pyreadr             0.4.7
pytoml              NA
pytz                2021.1
setuptools_scm      NA
sinfo               0.3.4
sitecustomize       NA
six                 1.16.0
sklearn             1.0.1
sphinxcontrib       NA
storemagic          NA
tables              3.6.1
texttable           1.6.4
threadpoolctl       2.2.0
tornado             6.1
tqdm                4.64.1
traitlets           5.0.5
typing_extensions   NA
wcwidth             0.2.5
zmq                 22.2.1

Gregor Sturm · Answer 1 · Wed Mar 29 2023 18:27:43 GMT+0800 (China Standard Time)

Hi @zvittorio,

in this case use_rep refers to the entire matrix used for plotting, not just the PCA of it.
The PCA (of the matrix specified with use_rep) is only used for the clustering when dendrogram=True.

Since the fix in #72 (now released as v0.4.2), it is possible to re-use a dendrogram that was previously calculated with scanpy:

cnv.tl.pca(adata)
sc.tl.dendrogram(adata, use_rep="X_cnv_pca")
# now, the dendrogram should not be recomputed. 
cnv.pl.chromosome_heatmap(adata, dendrogram=True)

zvittorio · Answer 2 · Wed Mar 29 2023 19:40:54 GMT+0800 (China Standard Time)

Dear @grst ,

Thank you for the clarification! I misunderstood the warning messages that tl.infercnv returns in case of no dendrogram in the adata object.
It makes sense now.

Vittorio