gao-lab / Cell_BLAST

A BLAST-like toolkit for large-scale scRNA-seq data querying and annotation.

Home Page:http://cblast.gao-lab.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in "wasserstein_distance" function

MartaBenegas opened this issue · comments

Hi CellBLAST team!
I'm new at using your python package and I encountered some problems. I've download the "Chen" reference panel from your website to perform a first test and I've performed the following steps in the python interpreter:

>>> import numpy as np
>>> import pandas as pd
>>> import tensorflow as tf
>>> import Cell_BLAST as cb
>>> reference = cb.data.ExprDataSet.read_dataset("/home/biobam/Downloads/Chen.h5")
>>> models = []
>>> for i in range(4):
>>>     models.append(cb.directi.fit_DIRECTi(reference, random_seed = i))
>>> blastdb = cb.blast.BLAST(models, reference)

And the last step raises this error, which I don't know how to solve:

>>> blastdb = cb.blast.BLAST(models, reference)
[INFO] Cell BLAST: Projecting to latent space...
[INFO] Cell BLAST: Fitting nearest neighbor trees...
[INFO] Cell BLAST: Sampling from posteriors...
[INFO] Cell BLAST: Generating empirical null distributions...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 473, in __init__
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 615, in _force_components
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 602, in _get_empirical
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 1041, in __call__
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 777, in _dispatch
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 572, in __init__
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 263, in __call__
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 263, in <listcomp>
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/numba/core/dispatcher.py", line 414, in _compile_for_args
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/numba/core/dispatcher.py", line 357, in error_rewrite
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function wasserstein_distance at 0x7f91740bbd90>) found for signature:
 
 >>> wasserstein_distance(array(float32, 1d, C), array(float32, 1d, C))
 
There are 2 candidate implementations:
   - Of which 2 did not match due to:
   Overload in function '_wasserstein_distance': File: Cell_BLAST/blast.py: Line 0.
     With argument(s): '(array(float32, 1d, C), array(float32, 1d, C))':
    Rejected as the implementation raised a specific error:
      RuntimeError: cannot cache function '_wasserstein_distance_impl': no locator available for file '/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py'
  raised from /home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/numba/core/caching.py:352

During: resolving callee type: Function(<function wasserstein_distance at 0x7f91740bbd90>)
During: typing of call at /home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py (209)

File "anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 209:
<source missing, REPL/exec in use?>

I'm running the python interpreter on the conda environment created for CellBLAST following the instructions in the installation guide:

(cb) biobam@biobam-500-526ns:~$ python3
Python 3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 23:10:56) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Thank you in advance!

Thanks for the report! Could you please provide the specific versions of numba, scipy and numba in your conda environment? They can be found via conda list | egrep '(numba|scipy|numba)'.

Hi, you've said two times numba, maybe you were referring to pandas or tensorflow? Anyway, here are the versions of these two packages as well, just in case:

(base) biobam@biobam-500-526ns:~$ conda activate cb
(cb) biobam@biobam-500-526ns:~$ conda list | egrep '(numba|scipy|tensorflow|pandas)'
numba                     0.52.0                   pypi_0    pypi
pandas                    1.1.5                    pypi_0    pypi
scipy                     1.5.4                    pypi_0    pypi
tensorflow                1.8.0                    pypi_0    pypi

Thanks for the clarification and sorry for the typo... I meant numpy, but the most probable cause should be numba and scipy.

I have tried creating a new environment with the same versions of numba, scipy, pandas and tensorflow, and ran the same lines of code on Chen.h5 data, but I cannot reproduce the error.

Could you please provide the full conda list output so I may check for other differences?

Of course! Here it is:

biobam@biobam-500-526ns:~$ conda activate cb
(cb) biobam@biobam-500-526ns:~$ conda list
# packages in environment at /home/biobam/anaconda3/envs/cb:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
absl-py                   0.11.0                   pypi_0    pypi
anndata                   0.7.5                    pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
bleach                    1.5.0                    pypi_0    pypi
ca-certificates           2020.12.8            h06a4308_0  
cached-property           1.5.2                    pypi_0    pypi
cell-blast                0.3.8                    pypi_0    pypi
certifi                   2020.12.5        py36h06a4308_0  
chardet                   3.0.4                    pypi_0    pypi
click                     7.1.2                    pypi_0    pypi
cycler                    0.10.0                   pypi_0    pypi
decorator                 4.4.2                    pypi_0    pypi
fastobo                   0.9.3                    pypi_0    pypi
gast                      0.4.0                    pypi_0    pypi
grpcio                    1.34.1                   pypi_0    pypi
h5py                      3.1.0                    pypi_0    pypi
html5lib                  0.9999999                pypi_0    pypi
importlib-metadata        3.4.0                    pypi_0    pypi
joblib                    1.0.0                    pypi_0    pypi
kiwisolver                1.3.1                    pypi_0    pypi
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
llvmlite                  0.35.0                   pypi_0    pypi
loompy                    3.0.6                    pypi_0    pypi
markdown                  3.3.3                    pypi_0    pypi
matplotlib                3.3.3                    pypi_0    pypi
natsort                   7.1.0                    pypi_0    pypi
ncurses                   6.2                  he6710b0_1  
networkx                  2.5                      pypi_0    pypi
numba                     0.52.0                   pypi_0    pypi
numpy                     1.19.5                   pypi_0    pypi
numpy-groupies            0.9.13                   pypi_0    pypi
openssl                   1.1.1i               h27cfd23_0  
packaging                 20.8                     pypi_0    pypi
pandas                    1.1.5                    pypi_0    pypi
patsy                     0.5.1                    pypi_0    pypi
pillow                    8.1.0                    pypi_0    pypi
pip                       20.3.3           py36h06a4308_0  
plotly                    4.14.3                   pypi_0    pypi
pronto                    2.3.2                    pypi_0    pypi
protobuf                  3.14.0                   pypi_0    pypi
pynndescent               0.5.1                    pypi_0    pypi
pyparsing                 2.4.7                    pypi_0    pypi
python                    3.6.12               hcff3b4d_2  
python-dateutil           2.8.1                    pypi_0    pypi
python-igraph             0.8.3                    pypi_0    pypi
pytz                      2020.5                   pypi_0    pypi
readline                  8.0                  h7b6447c_0  
retrying                  1.3.3                    pypi_0    pypi
scikit-learn              0.24.0                   pypi_0    pypi
scipy                     1.5.4                    pypi_0    pypi
seaborn                   0.11.1                   pypi_0    pypi
setuptools                51.1.2           py36h06a4308_4  
six                       1.15.0                   pypi_0    pypi
sqlite                    3.33.0               h62c20be_0  
statsmodels               0.12.1                   pypi_0    pypi
tensorboard               1.8.0                    pypi_0    pypi
tensorflow                1.8.0                    pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
texttable                 1.6.3                    pypi_0    pypi
threadpoolctl             2.1.0                    pypi_0    pypi
tk                        8.6.10               hbc83047_0  
tqdm                      4.56.0                   pypi_0    pypi
typing-extensions         3.7.4.3                  pypi_0    pypi
umap-learn                0.5.0                    pypi_0    pypi
werkzeug                  1.0.1                    pypi_0    pypi
wheel                     0.36.2             pyhd3eb1b0_0  
xz                        5.2.5                h7b6447c_0  
zipp                      3.4.0                    pypi_0    pypi
zlib                      1.2.11               h7b6447c_3 

Thanks! I built an exact same environment but still could not reproduce the error... Maybe it's because of some external system libraries that the pip-installed packages depend on? In that case, you may try installing all dependencies via conda and use pip only to install Cell BLAST.

Here's a yaml file for environment configuration: debug.yml.gz.

You can build the environment (assuming environment name is "debug") via:

gunzip debug.yml.gz
conda env create -n debug -f debug.yml
conda activate debug
pip install cell-blast

I have verified that this configuration works at least on the machine I'm using. Hope that helps.

Btw, for quicker testing, you can use cb.directi.fit_DIRECTi(reference, random_seed=i, epoch=1).

Just found this: librosa/librosa#1156

They seem to suggest setting NUMBA_CACHE_DIR fixes this problem.

import os
os.environ[ 'NUMBA_CACHE_DIR' ] = '/tmp/'  # Or some other writable directory

Both solutions worked for me, thank you!

Maybe I know where the problem is. I tried to follow the installation guide:

(debug) biobam@biobam-500-526ns:~/Downloads$ conda create -n cbtest python=3.6 && source activate cbtest
[...]
(debug) biobam@biobam-500-526ns:~/Downloads$ conda activate cbtest

But when I tried to install tensorflow as you specified I encountered this problem:

(cbtest) biobam@biobam-500-526ns:~/Downloads$ conda install tensorflow=1.8
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: | 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

So I installed it without the version info, which installs by default the latest version (2.2.0): (cbtest) biobam@biobam-500-526ns:~/Downloads$ conda install tensorflow

And finally: (cbtest) biobam@biobam-500-526ns:~/Downloads$ pip install Cell-BLAST

However, when I tried to start the analysis it raised the following error:

(cbtest) biobam@biobam-500-526ns:~/Downloads$ python3
Python 3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 23:10:56) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas as pd
>>> import tensorflow as tf
>>> import Cell_BLAST as cb
OMP: Info #274: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/__init__.py", line 11, in <module>
    from . import (blast, config, data, directi, latent, metrics, prob, rmbatch,
  File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 19, in <module>
    from . import config, data, directi, metrics, utils
  File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/directi.py", line 16, in <module>
    from . import config, data, latent, model, prob, rmbatch, utils
  File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/latent.py", line 13, in <module>
    from . import module, nn, utils
  File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/module.py", line 15, in <module>
    class Module(object):
  File "/home/biobam/anaconda3/envs/cbtest/lib/python3.6/site-packages/Cell_BLAST/module.py", line 32, in Module
    def _save_weights(self, sess: tf.Session, path: str) -> None:
AttributeError: module 'tensorflow' has no attribute 'Session'

And that's because tensorflow versions higher than 2 doesn't use the attribute 'Session' anymore.
So, to solve this I installed tensorflow with pip install tensorflow==1.8 instead.

Maybe it's because of some external system libraries that the pip-installed packages depend on?

I don't know if this difference is causing this problem that you are mentioning, but anyways I though that maybe you would like to know this issue. Here you have the whole log if you want to take a look:
log.txt

Hope it helps!
Marta.

I just tried installing tensorflow 1.8 as well, and it's indeed no longer working. Tensorflow 1.8 was the version I used during development, which is a bit too old now. Maybe conda removed some dependencies from their default channel... Yes tensorflow 2.x won't work because of many backward incompatible changes. But later versions of tensorflow 1.x (e.g., 1.12) work fine.

Nevertheless, I don't see why the tensorflow installation breaks numba caching...

Anyway, thanks a lot for the elaboration! I will update the installation guide with a newer version of tensorflow 1.x that still installs from conda.

Hi! I encountered similar errors when run cb.blast.BLAST(models, adata). I create conda environment just as what it is in https://cblast.readthedocs.io/en/latest/BLAST.html. And it seems not the problems about version of python packages.

Here is the error information. How can I solve this?
image
image
image