Inconsistent use_rep usage through src files

zvittorio opened this issue · comments


Dear infercnvpy team,

First of all, thanks for making inferCNV scalable to big dataset and for the tool in general.
I have noticed that when using the chromosome_heatmap method, passing use_rep argument as default, infercnvpy runs the pca again, making the plotting quite time consuming.
I tried avoiding this by specifying 'X_cnv_pca' in use_rep but I get this error:

cnv.pl.chromosome_heatmap(adata[adata.obs["Annot"] == "NE",:], 
                          use_rep= 'X_cnv_pca', 
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_3634936/1577166331.py in <module>
----> 1 cnv.pl.chromosome_heatmap(adata[adata.obs["Author_annot_unified"] == "NE",:], 
      2                           groupby="Study",
      3                           use_rep= 'X_cnv_pca',
      4                           dendrogram=True)

/lib/python3.9/site-packages/infercnvpy/pl/_chromosome_heatmap.py in chromosome_heatmap(adata, groupby, use_rep, cmap, figsize, show, save, **kwargs)
     56     if groupby == "cnv_leiden" and "cnv_leiden" not in adata.obs.columns:
     57         raise ValueError("'cnv_leiden' is not in `adata.obs`. Did you run `tl.leiden()`?")
---> 58     tmp_adata = AnnData(X=adata.obsm[f"X_{use_rep}"], obs=adata.obs)
     60     # transfer colors from adata if present

lib/python3.9/site-packages/anndata/_core/aligned_mapping.py in __getitem__(self, key)
    111     def __getitem__(self, key: str) -> V:
    112         return as_view(
--> 113             _subset(self.parent_mapping[key], self.subset_idx),
    114             ElementRef(self.parent, self.attrname, (key,)),
    115         )

/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py in __getitem__(self, key)
    147     def __getitem__(self, key: str) -> V:
--> 148         return self._data[key]
    150     def __setitem__(self, key: str, value: V):

KeyError: 'X_X_cnv_pca'

suggesting that I should have specified just cnv_pca because of X_{use_rep}. So I also tried:

cnv.pl.chromosome_heatmap(adata[adata.obs["Author_annot_unified"] == "NE",:], 
                          use_rep= 'cnv_pca', 

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_3634936/3064392404.py in <module>
----> 1 cnv.pl.chromosome_heatmap(adata[adata.obs["Author_annot_unified"] == "NE",:], 
      2                           groupby="Study",
      3                           use_rep= 'cnv_pca',
      4                           dendrogram=True)

lib/python3.9/site-packages/infercnvpy/pl/_chromosome_heatmap.py in chromosome_heatmap(adata, groupby, use_rep, cmap, figsize, show, save, **kwargs)
     64     # re-sort, as saving & loading anndata destroys the order
---> 65     chr_pos_dict = dict(sorted(adata.uns[use_rep]["chr_pos"].items(), key=lambda x: x[1]))
     66     chr_pos = list(chr_pos_dict.values())

lib/python3.9/site-packages/anndata/compat/_overloaded_dict.py in __getitem__(self, key)
     98             return self.overloaded[key].get()
     99         else:
--> 100             return self.data[key]
    102     def __setitem__(self, key, value):

KeyError: 'cnv_pca' 

with a slightly different error. Apparently in chromosome_heatmap.py the usage of the use_rep argument is again "as is" instead of using X_{use_rep}.
I am not working on my local machine, so I will not be able to change the line in the source file myself.
Thank you if you can apply this change or provide any clarification!


Version information

anndata             0.7.8
infercnvpy          0.4.0
matplotlib          3.4.3
numpy               1.21.3
pandas              1.3.4
scanpy              1.8.2
scipy               1.7.1
session_info        1.0.0
PIL                 8.3.2
backcall            0.2.0
beta_ufunc          NA
binom_ufunc         NA
bottleneck          1.3.2
cairo               1.20.1
cffi                1.14.6
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
decorator           5.0.9
defusedxml          0.7.1
gtfparse            1.2.1
h5py                3.6.0
igraph              0.9.8
ipykernel           6.0.3
ipython_genutils    0.2.0
ipywidgets          7.6.3
jedi                0.18.0
joblib              1.0.1
kiwisolver          1.3.2
leidenalg           0.8.8
llvmlite            0.37.0
matplotlib_inline   NA
mpl_toolkits        NA
natsort             8.0.2
nbinom_ufunc        NA
netifaces           0.11.0
numba               0.54.1
numexpr             2.7.3
packaging           20.9
parso               0.8.2
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
prompt_toolkit      3.0.19
psutil              5.8.0
ptyprocess          0.7.0
pycparser           2.20
pyexpat             NA
pygments            2.9.0
pyparsing           2.4.7
pyreadr             0.4.7
pytoml              NA
pytz                2021.1
setuptools_scm      NA
sinfo               0.3.4
sitecustomize       NA
six                 1.16.0
sklearn             1.0.1
sphinxcontrib       NA
storemagic          NA
tables              3.6.1
texttable           1.6.4
threadpoolctl       2.2.0
tornado             6.1
tqdm                4.64.1
traitlets           5.0.5
typing_extensions   NA
wcwidth             0.2.5
zmq                 22.2.1

Hi @zvittorio,

in this case use_rep refers to the entire matrix used for plotting, not just the PCA of it.
The PCA (of the matrix specified with use_rep) is only used for the clustering when dendrogram=True.

Since the fix in #72 (now released as v0.4.2), it is possible to re-use a dendrogram that was previously calculated with scanpy:

sc.tl.dendrogram(adata, use_rep="X_cnv_pca")
# now, the dendrogram should not be recomputed. 
cnv.pl.chromosome_heatmap(adata, dendrogram=True)

Dear @grst ,

Thank you for the clarification! I misunderstood the warning messages that tl.infercnv returns in case of no dendrogram in the adata object.
It makes sense now.
