Inconsistent use_rep usage through src files
zvittorio opened this issue · comments
Report
Dear infercnvpy team,
First of all, thanks for making inferCNV scalable to big dataset and for the tool in general.
I have noticed that when using the chromosome_heatmap method, passing use_rep argument as default, infercnvpy runs the pca again, making the plotting quite time consuming.
I tried avoiding this by specifying 'X_cnv_pca' in use_rep but I get this error:
cnv.pl.chromosome_heatmap(adata[adata.obs["Annot"] == "NE",:],
groupby="Study",
use_rep= 'X_cnv_pca',
dendrogram=True)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipykernel_3634936/1577166331.py in <module>
----> 1 cnv.pl.chromosome_heatmap(adata[adata.obs["Author_annot_unified"] == "NE",:],
2 groupby="Study",
3 use_rep= 'X_cnv_pca',
4 dendrogram=True)
/lib/python3.9/site-packages/infercnvpy/pl/_chromosome_heatmap.py in chromosome_heatmap(adata, groupby, use_rep, cmap, figsize, show, save, **kwargs)
56 if groupby == "cnv_leiden" and "cnv_leiden" not in adata.obs.columns:
57 raise ValueError("'cnv_leiden' is not in `adata.obs`. Did you run `tl.leiden()`?")
---> 58 tmp_adata = AnnData(X=adata.obsm[f"X_{use_rep}"], obs=adata.obs)
59
60 # transfer colors from adata if present
lib/python3.9/site-packages/anndata/_core/aligned_mapping.py in __getitem__(self, key)
111 def __getitem__(self, key: str) -> V:
112 return as_view(
--> 113 _subset(self.parent_mapping[key], self.subset_idx),
114 ElementRef(self.parent, self.attrname, (key,)),
115 )
/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py in __getitem__(self, key)
146
147 def __getitem__(self, key: str) -> V:
--> 148 return self._data[key]
149
150 def __setitem__(self, key: str, value: V):
KeyError: 'X_X_cnv_pca'
suggesting that I should have specified just cnv_pca because of X_{use_rep}. So I also tried:
cnv.pl.chromosome_heatmap(adata[adata.obs["Author_annot_unified"] == "NE",:],
groupby="Study",
use_rep= 'cnv_pca',
dendrogram=True)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipykernel_3634936/3064392404.py in <module>
----> 1 cnv.pl.chromosome_heatmap(adata[adata.obs["Author_annot_unified"] == "NE",:],
2 groupby="Study",
3 use_rep= 'cnv_pca',
4 dendrogram=True)
lib/python3.9/site-packages/infercnvpy/pl/_chromosome_heatmap.py in chromosome_heatmap(adata, groupby, use_rep, cmap, figsize, show, save, **kwargs)
63
64 # re-sort, as saving & loading anndata destroys the order
---> 65 chr_pos_dict = dict(sorted(adata.uns[use_rep]["chr_pos"].items(), key=lambda x: x[1]))
66 chr_pos = list(chr_pos_dict.values())
67
lib/python3.9/site-packages/anndata/compat/_overloaded_dict.py in __getitem__(self, key)
98 return self.overloaded[key].get()
99 else:
--> 100 return self.data[key]
101
102 def __setitem__(self, key, value):
KeyError: 'cnv_pca'
with a slightly different error. Apparently in chromosome_heatmap.py the usage of the use_rep argument is again "as is" instead of using X_{use_rep}.
I am not working on my local machine, so I will not be able to change the line in the source file myself.
Thank you if you can apply this change or provide any clarification!
Vittorio
Version information
-----
anndata 0.7.8
infercnvpy 0.4.0
matplotlib 3.4.3
numpy 1.21.3
pandas 1.3.4
scanpy 1.8.2
scipy 1.7.1
session_info 1.0.0
-----
PIL 8.3.2
backcall 0.2.0
beta_ufunc NA
binom_ufunc NA
bottleneck 1.3.2
cairo 1.20.1
cffi 1.14.6
colorama 0.4.4
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
decorator 5.0.9
defusedxml 0.7.1
gtfparse 1.2.1
h5py 3.6.0
igraph 0.9.8
ipykernel 6.0.3
ipython_genutils 0.2.0
ipywidgets 7.6.3
jedi 0.18.0
joblib 1.0.1
kiwisolver 1.3.2
leidenalg 0.8.8
llvmlite 0.37.0
matplotlib_inline NA
mpl_toolkits NA
natsort 8.0.2
nbinom_ufunc NA
netifaces 0.11.0
numba 0.54.1
numexpr 2.7.3
packaging 20.9
parso 0.8.2
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
prompt_toolkit 3.0.19
psutil 5.8.0
ptyprocess 0.7.0
pycparser 2.20
pyexpat NA
pygments 2.9.0
pyparsing 2.4.7
pyreadr 0.4.7
pytoml NA
pytz 2021.1
setuptools_scm NA
sinfo 0.3.4
sitecustomize NA
six 1.16.0
sklearn 1.0.1
sphinxcontrib NA
storemagic NA
tables 3.6.1
texttable 1.6.4
threadpoolctl 2.2.0
tornado 6.1
tqdm 4.64.1
traitlets 5.0.5
typing_extensions NA
wcwidth 0.2.5
zmq 22.2.1
Hi @zvittorio,
in this case use_rep
refers to the entire matrix used for plotting, not just the PCA of it.
The PCA (of the matrix specified with use_rep
) is only used for the clustering when dendrogram=True
.
Since the fix in #72 (now released as v0.4.2), it is possible to re-use a dendrogram that was previously calculated with scanpy:
cnv.tl.pca(adata)
sc.tl.dendrogram(adata, use_rep="X_cnv_pca")
# now, the dendrogram should not be recomputed.
cnv.pl.chromosome_heatmap(adata, dendrogram=True)