BenevolentAI / DeeplyTough

DeeplyTough: Learning Structural Comparison of Protein Binding Sites

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Assertion Error : No htmd could be found

gayatripanda5 opened this issue · comments

I was using DeeplyTough for a user-defined data set , I followed the steps mentioned in "Custom Dataset" of your article and
1.Added the path for STRUCTURE_DATA_DIR environment variable in bashrc file. For testing purposes, I took one pair of PDB structures, their pockets in .pdb format and a csv file for their pairing. I kept all of this in datasets/custom directory.
2.Executed "python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --db_preprocessing 1 --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar"

I am getting the following warning and error:

2020-08-11 11:42:54,118 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough master/datasets/processed/htmd/custom/ind_pdbs/6I83_clean.npz,corresponding pdb likely could not be parsed

AssertionError: No HTMD could be found but 11PDB files were given, please call preprocess_once() on the dataset.

Can you suggest me where am I going wrong and what can I do rectify this error?

Thanks for your interest in DeeplyTough. Have you checked out #1 ?

Thanks for your reply. I checked this #1 . It is similar to my case , no .npz files were found in this directory.

I executed the command
python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --db_preprocessing 1 --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar

It gave these warnings
11it [00:00, 1106.86it/s]
2020-08-11 13:57:56,381 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/ind_pdbs/6I83_clean.npz,corresponding pdb likely could not be parsed
2020-08-11 13:57:56,381 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/ind_pdbs/3GC9_clean.npz,corresponding pdb likely could not be parsed
...............
Along with this error
AssertionError: No HTMD could be found but 11PDB files were given, please call preprocess_once() on the dataset

No .npz files were formed.
image

Thanks for the details. However, in your screenshot I see no files or directories, i.e. no .pdb files in the ind_pdbs directory? Could you perhaps verify that the toy custom dataset distributed in this repository (https://github.com/BenevolentAI/DeeplyTough/tree/master/datasets/custom) works at your place? And then, could you follow the structure of this toy repository for your dataset?

Thank for your reply. After running this command on your toy dataset:

python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar

I am getting this same warning and error:
8it [00:00, 608.95it/s]
2020-08-12 17:31:43,328 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/1a05B/1a05B.npz,corresponding pdb likely could not be parsed

AssertionError: No HTMD could be found but 8PDB files were given, please call preprocess_once() on the dataset.

Now, coming to my dataset, I kept all my files(.pdb) in this directory (/dataset/custom/ind_pdbs)
All _out files were created by by your script, so this is clear that it has processed these files, but then gave this error at last.
image

Thanks. Maybe I misinterpret your screenshots but it seems to me that the incorporation of your dataset within datasets directory has somewhat corrupted it. Could you perhaps:

  1. Remove the datasets directory completely and revert it to the state as in this github repository.
  2. Run python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar. This should really succeed while printing out a lot of outputs, including messages like "Pre-processing xxxx with HTMD...".

If it's OK, continue:
3. Create new directory datasets/your_dataset and put your pds as well as modified .csv file there.
4. Run python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'your_dataset' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar. This should print out a lot of outputs, including messages like "Pre-processing xxxx with HTMD...".
5. If it doesn't work and you have the permission to do so, could you perhaps upload here the content of 'your_dataset'?

Thanks a lot. I apologize for bugging you again . I followed what you said for your toy set , it gave few warnings
*** Open Babel Warning in parseAtomRecord
WARNING: Problems reading a PDB file
Problems reading a HETATM or ATOM record.
According to the PDB specification,
columns 77-78 should contain the element symbol of an atom.
but OpenBabel found ' ' (atom 2692)
1 molecule converted
Traceback (most recent call last):
File "/home/iiitd/miniconda3/envs/deeplytough_mgltools/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_receptor4.py", line 10, in
import MolKit.molecule
File "/home/iiitd/miniconda3/envs/deeplytough_mgltools/MGLToolsPckgs/MolKit/molecule.py", line 23, in
from mglutil.util import misc
File "/home/iiitd/miniconda3/envs/deeplytough_mgltools/MGLToolsPckgs/mglutil/util/misc.py", line 17, in
import numpy
File "/home/iiitd/.local/lib/python2.7/site-packages/numpy/init.py", line 142, in
from . import core
File "/home/iiitd/.local/lib/python2.7/site-packages/numpy/core/init.py", line 71, in
raise ImportError(msg)
ImportError:

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
Here is how to proceed:

  • If you're working with a numpy git repository, try git clean -xdf
    (removes all files not under version control) and rebuild numpy.
  • If you are simply trying to use the numpy version that you have installed:
    your installation is broken - please reinstall numpy.
  • If you have already reinstalled and that did not fix the problem, then:
    1. Check that you are using the Python you expect (you're using /home/iiitd/miniconda3/envs/deeplytough_mgltools/bin/python),
      and that you have no directories in your PATH or PYTHONPATH that can
      interfere with the Python and numpy versions you're trying to use.

    2. If (1) looks fine, you can open a new issue at
      https://github.com/numpy/numpy/issues. Please include details on:

      • how you installed Python
      • how you installed numpy
      • your operating system
      • whether or not you have multiple versions of Python installed
      • if you built from source, your compiler versions and ideally a build log

      Note: this error has many possible causes, so please don't comment on
      an existing issue about this - open a new one instead.

Original error was: /home/iiitd/.local/lib/python2.7/site-packages/numpy/core/_multiarray_umath.so: undefined symbol: PyUnicodeUCS4_FromObject

Then ended with the same error

2020-08-13 22:03:57,567 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/1a9t/1a9t_clean.npz,corresponding pdb likely could not be parsed
Traceback (most recent call last):
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/scripts/custom_evaluation.py", line 69, in
main()
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/scripts/custom_evaluation.py", line 41, in main
entries = matcher.precompute_descriptors(entries)
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/matchers/deeply_tough.py", line 46, in precompute_descriptors
feats = load_and_precompute_point_feats(self.model, self.args, pdb_list, point_list, self.device, self.nworkers, self.batch_size)
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/engine/predictor.py", line 37, in load_and_precompute_point_feats
dataset = PointOfInterestVoxelizedDataset(pdb_list, point_list, box_size=args.patch_size)
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/engine/datasets.py", line 220, in init
super().init(pdb_list, box_size=box_size, augm_rot=False, augm_mirror_prob=0)
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/engine/datasets.py", line 46, in init
assert len(self.pdb_list) > 0, f'No HTMD could be found but {len(pdb_list)}'
AssertionError: No HTMD could be found but 8PDB files were given, please call preprocess_once() on the dataset

Thanks, that's very helpful, the problem apparently is that MGLTools crashes. I would suggest two next steps:

  1. Could you post here your $PYTHONPATH and $PATH, please? I'm a bit suspicious about the path in the stack trace containing a non-conda directory: "/home/iiitd/.local/lib/python2.7/site-packages/numpy/init.py",

  2. The problem might be due to a new version of mgltools, which we haven't fixed. Could you perhaps run the following conda commands, then again delete datasets/processed directory and run the custom_evaluation.py command?

conda remove --name deeplytough_mgltools --all
conda create -y -n deeplytough_mgltools python=2.7
conda install -y -n deeplytough_mgltools -c bioconda mgltools=1.5.6

Please accept my apologies for this delayed response.

  1. The paths are
    export PYTHONPATH=$DEEPLYTOUGH/deeplytough:$PYTHONPATH
    export PATH=$DEEPLYTOUGH/fpocket2/bin:$PATH

  2. I followed the steps suggested by you. Now , everything seems fine.I ran this command "python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar"
    for your toy dataset and my dataset too. It ran successfully. Big thanks to you.

image

That's terrific, I'm glad it works now, thanks for reporting the issue! I will fix the version of mgltools in the repository.

Hi, unfortunately the image has not been inserted well, can you try to edit your comment? In general, the similarity score allows for comparing whether two pockets are more similar that other pairs (the score is simply higher). Choosing a particular threshold is not well defined, for a larger dataset you would perhaps just plot ROC curve and decide on an operating point as a balance between true and false positive rates.

I'm still unable to see the image. But scores are defined as negative distances, so the more negative, the less similar.