Assertion Error : No htmd could be found

Question

Assertion Error : No htmd could be found

gayatripanda5 opened this issue 4 years ago · comments

I was using DeeplyTough for a user-defined data set , I followed the steps mentioned in "Custom Dataset" of your article and
1.Added the path for STRUCTURE_DATA_DIR environment variable in bashrc file. For testing purposes, I took one pair of PDB structures, their pockets in .pdb format and a csv file for their pairing. I kept all of this in datasets/custom directory.
2.Executed "python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --db_preprocessing 1 --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar"

I am getting the following warning and error:

2020-08-11 11:42:54,118 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough master/datasets/processed/htmd/custom/ind_pdbs/6I83_clean.npz,corresponding pdb likely could not be parsed

AssertionError: No HTMD could be found but 11PDB files were given, please call preprocess_once() on the dataset.

Can you suggest me where am I going wrong and what can I do rectify this error?

Martin Simonovsky · Answer 1 · Tue Aug 11 2020 15:51:07 GMT+0800 (China Standard Time)

Thanks for your interest in DeeplyTough. Have you checked out #1 ?

Gayatri Panda · Answer 2 · Tue Aug 11 2020 16:34:38 GMT+0800 (China Standard Time)

Thanks for your reply. I checked this #1 . It is similar to my case , no .npz files were found in this directory.

I executed the command
python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --db_preprocessing 1 --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar

It gave these warnings
11it [00:00, 1106.86it/s]
2020-08-11 13:57:56,381 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/ind_pdbs/6I83_clean.npz,corresponding pdb likely could not be parsed
2020-08-11 13:57:56,381 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/ind_pdbs/3GC9_clean.npz,corresponding pdb likely could not be parsed
...............
Along with this error
AssertionError: No HTMD could be found but 11PDB files were given, please call preprocess_once() on the dataset

No .npz files were formed.

Martin Simonovsky · Answer 3 · Wed Aug 12 2020 02:03:10 GMT+0800 (China Standard Time)

Thanks for the details. However, in your screenshot I see no files or directories, i.e. no .pdb files in the ind_pdbs directory? Could you perhaps verify that the toy custom dataset distributed in this repository (https://github.com/BenevolentAI/DeeplyTough/tree/master/datasets/custom) works at your place? And then, could you follow the structure of this toy repository for your dataset?

Gayatri Panda · Answer 4 · Wed Aug 12 2020 20:11:31 GMT+0800 (China Standard Time)

Thank for your reply. After running this command on your toy dataset:

python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar

I am getting this same warning and error:
8it [00:00, 608.95it/s]
2020-08-12 17:31:43,328 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/1a05B/1a05B.npz,corresponding pdb likely could not be parsed

AssertionError: No HTMD could be found but 8PDB files were given, please call preprocess_once() on the dataset.

Now, coming to my dataset, I kept all my files(.pdb) in this directory (/dataset/custom/ind_pdbs)
All _out files were created by by your script, so this is clear that it has processed these files, but then gave this error at last.

Martin Simonovsky · Answer 5 · Thu Aug 13 2020 17:00:01 GMT+0800 (China Standard Time)

Thanks. Maybe I misinterpret your screenshots but it seems to me that the incorporation of your dataset within datasets directory has somewhat corrupted it. Could you perhaps:

Remove the datasets directory completely and revert it to the state as in this github repository.
Run python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar. This should really succeed while printing out a lot of outputs, including messages like "Pre-processing xxxx with HTMD...".

If it's OK, continue:
3. Create new directory datasets/your_dataset and put your pds as well as modified .csv file there.
4. Run python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'your_dataset' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar. This should print out a lot of outputs, including messages like "Pre-processing xxxx with HTMD...".
5. If it doesn't work and you have the permission to do so, could you perhaps upload here the content of 'your_dataset'?

Gayatri Panda · Answer 6 · Fri Aug 14 2020 00:38:58 GMT+0800 (China Standard Time)

Thanks a lot. I apologize for bugging you again . I followed what you said for your toy set , it gave few warnings
*** Open Babel Warning in parseAtomRecord
WARNING: Problems reading a PDB file
Problems reading a HETATM or ATOM record.
According to the PDB specification,
columns 77-78 should contain the element symbol of an atom.
but OpenBabel found ' ' (atom 2692)
1 molecule converted
Traceback (most recent call last):
File "/home/iiitd/miniconda3/envs/deeplytough_mgltools/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_receptor4.py", line 10, in
import MolKit.molecule
File "/home/iiitd/miniconda3/envs/deeplytough_mgltools/MGLToolsPckgs/MolKit/molecule.py", line 23, in
from mglutil.util import misc
File "/home/iiitd/miniconda3/envs/deeplytough_mgltools/MGLToolsPckgs/mglutil/util/misc.py", line 17, in
import numpy
File "/home/iiitd/.local/lib/python2.7/site-packages/numpy/init.py", line 142, in
from . import core
File "/home/iiitd/.local/lib/python2.7/site-packages/numpy/core/init.py", line 71, in
raise ImportError(msg)
ImportError:

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
Here is how to proceed:

If you're working with a numpy git repository, try git clean -xdf
(removes all files not under version control) and rebuild numpy.
If you are simply trying to use the numpy version that you have installed:
your installation is broken - please reinstall numpy.
If you have already reinstalled and that did not fix the problem, then:
1. Check that you are using the Python you expect (you're using /home/iiitd/miniconda3/envs/deeplytough_mgltools/bin/python),
  and that you have no directories in your PATH or PYTHONPATH that can
  interfere with the Python and numpy versions you're trying to use.
2. If (1) looks fine, you can open a new issue at
  https://github.com/numpy/numpy/issues. Please include details on:
  - how you installed Python
  - how you installed numpy
  - your operating system
  - whether or not you have multiple versions of Python installed
  - if you built from source, your compiler versions and ideally a build log
  Note: this error has many possible causes, so please don't comment on
  an existing issue about this - open a new one instead.

Original error was: /home/iiitd/.local/lib/python2.7/site-packages/numpy/core/_multiarray_umath.so: undefined symbol: PyUnicodeUCS4_FromObject

Then ended with the same error

2020-08-13 22:03:57,567 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/1a9t/1a9t_clean.npz,corresponding pdb likely could not be parsed
Traceback (most recent call last):
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/scripts/custom_evaluation.py", line 69, in
main()
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/scripts/custom_evaluation.py", line 41, in main
entries = matcher.precompute_descriptors(entries)
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/matchers/deeply_tough.py", line 46, in precompute_descriptors
feats = load_and_precompute_point_feats(self.model, self.args, pdb_list, point_list, self.device, self.nworkers, self.batch_size)
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/engine/predictor.py", line 37, in load_and_precompute_point_feats
dataset = PointOfInterestVoxelizedDataset(pdb_list, point_list, box_size=args.patch_size)
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/engine/datasets.py", line 220, in init
super().init(pdb_list, box_size=box_size, augm_rot=False, augm_mirror_prob=0)
File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/engine/datasets.py", line 46, in init
assert len(self.pdb_list) > 0, f'No HTMD could be found but {len(pdb_list)}'
AssertionError: No HTMD could be found but 8PDB files were given, please call preprocess_once() on the dataset

Martin Simonovsky · Answer 7 · Fri Aug 14 2020 03:45:53 GMT+0800 (China Standard Time)

Thanks, that's very helpful, the problem apparently is that MGLTools crashes. I would suggest two next steps:

Could you post here your $PYTHONPATH and $PATH, please? I'm a bit suspicious about the path in the stack trace containing a non-conda directory: "/home/iiitd/.local/lib/python2.7/site-packages/numpy/init.py",
The problem might be due to a new version of mgltools, which we haven't fixed. Could you perhaps run the following conda commands, then again delete datasets/processed directory and run the custom_evaluation.py command?

conda remove --name deeplytough_mgltools --all
conda create -y -n deeplytough_mgltools python=2.7
conda install -y -n deeplytough_mgltools -c bioconda mgltools=1.5.6

Gayatri Panda · Answer 8 · Wed Aug 19 2020 16:32:49 GMT+0800 (China Standard Time)

Please accept my apologies for this delayed response.

The paths are
export PYTHONPATH=$DEEPLYTOUGH/deeplytough:$PYTHONPATH
export PATH=$DEEPLYTOUGH/fpocket2/bin:$PATH
I followed the steps suggested by you. Now , everything seems fine.I ran this command "python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar"
for your toy dataset and my dataset too. It ran successfully. Big thanks to you.

Martin Simonovsky · Answer 9 · Wed Aug 19 2020 18:52:33 GMT+0800 (China Standard Time)

That's terrific, I'm glad it works now, thanks for reporting the issue! I will fix the version of mgltools in the repository.

Gayatri Panda · Answer 10 · Wed Aug 19 2020 18:53:44 GMT+0800 (China Standard Time)

Thanks a lot for your help.

…

On Wed, Aug 19, 2020, 16:22 Martin Simonovsky ***@***.***> wrote: That's terrific, I'm glad it works now, thanks for reporting the issue! I will fix the version of mgltools in the repository. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQSH3U42LD57XV3BGYRZAKDSBOVH7ANCNFSM4P2V7O6A> .

Gayatri Panda · Answer 11 · Thu Sep 03 2020 14:10:51 GMT+0800 (China Standard Time)

Dear Sir, I am grateful for your help in this matter. I needed your help in understanding what could be a threshold value for the Pocket-Similarity score, above which we could say that two pockets have a better similarity than others. For my set of inputs, I got this result [image: image.png] Can you assist me to understand these results? I apologize for bothering you so many times, just wanted your help in analyzing the results. Thanks in advance. Regards Gayatri Panda

…

On Wed, Aug 19, 2020 at 4:23 PM Gayatri Panda ***@***.***> wrote: Thanks a lot for your help. On Wed, Aug 19, 2020, 16:22 Martin Simonovsky ***@***.***> wrote: > That's terrific, I'm glad it works now, thanks for reporting the issue! I > will fix the version of mgltools in the repository. > > — > You are receiving this because you modified the open/close state. > Reply to this email directly, view it on GitHub > <#5 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AQSH3U42LD57XV3BGYRZAKDSBOVH7ANCNFSM4P2V7O6A> > . >

-- *Gayatri Panda* *PhD19206 (Computational Biology)*

Martin Simonovsky · Answer 12 · Sat Sep 05 2020 05:05:08 GMT+0800 (China Standard Time)

Hi, unfortunately the image has not been inserted well, can you try to edit your comment? In general, the similarity score allows for comparing whether two pockets are more similar that other pairs (the score is simply higher). Choosing a particular threshold is not well defined, for a larger dataset you would perhaps just plot ROC curve and decide on an operating point as a balance between true and false positive rates.

Gayatri Panda · Answer 13 · Sat Sep 05 2020 05:41:37 GMT+0800 (China Standard Time)

Thanks a lot for your reply. The image focussing only on the scores is attached below. So, can we say for now , that more negative pocket similarity score means more similar??

…

On Sat, Sep 5, 2020, 02:35 Martin Simonovsky ***@***.***> wrote: Hi, unfortunately the image has not been inserted well, can you try to edit your comment? In general, the similarity score allows for comparing whether two pockets are more similar that other pairs (the score is simply higher). Choosing a particular threshold is not well defined, for a larger dataset you would perhaps just plot ROC curve and decide on an operating point as a balance between true and false positive rates. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQSH3U4ZM6GUDILQEIGPPITSEFJBHANCNFSM4P2V7O6A> .

Martin Simonovsky · Answer 14 · Sat Sep 05 2020 05:45:47 GMT+0800 (China Standard Time)

I'm still unable to see the image. But scores are defined as negative distances, so the more negative, the less similar.

Gayatri Panda · Answer 15 · Sat Sep 05 2020 06:03:39 GMT+0800 (China Standard Time)

Sorry to be troublesome. I am grateful to you for your quick and elaborate responses. I couldn't figure out the issue with the image, anyways I am attaching it below. However, I now have some idea of how to analyze the scores. Thanks a ton.

…

On Sat, Sep 5, 2020 at 3:16 AM Martin Simonovsky ***@***.***> wrote: I'm still unable to see the image. But scores are defined as negative distances, so the more negative, the less similar. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQSH3U6LE6TF6K4FHJQLBZTSEFNZTANCNFSM4P2V7O6A> .

-- *Gayatri Panda* *PhD19206 (Computational Biology)*