arneschneuing / DiffSBDD

A Euclidean diffusion model for structure-based drug design.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data preparation failed in Colab

rwbfd opened this issue · comments

commented

I have been trying to replicate the training results based on the original GitHub notebook in the repository. However, when it comes to preparing the data, it doesn't work. When I run the command python ./DiffSBDD/process_crossdock.py /content/crossdocked_pocket10 --no_H y, such errors occurred:

#failed: 100000: 100% 100000/100000 [00:10<00:00, 9586.86it/s]
Traceback (most recent call last):
  File "/content/./DiffSBDD/process_crossdock.py", line 353, in <module>
    lig_coords = np.concatenate(lig_coords, axis=0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

Similar but more cryptic issues arrive for the other datasets. When performed in the Colab cells, the error is

Traceback (most recent call last):
  File "/content/./DiffSBDD/process_bindingmoad.py", line 450, in <module>
    with open(f'data/moad_{split}.txt', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/moad_test.txt

Considering sometimes Colab cells might perform some funky behavior, I have decided to use the command line. Now the error is the same as before:

#failed: 130: 100%|█| 130/130 [00:00<00:00, 9719.25it/s
Traceback (most recent call last):
  File "/content/DiffSBDD/process_bindingmoad.py", line 571, in <module>
    lig_coords = np.concatenate(lig_coords, axis=0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

I have posted the Colab notebook here.

Any help will be greatly appreciated.

commented

Have you solved this problem? I have the same problem.

Hi @rwbfd and @xiaoxiannv999,
sorry for the slow response. With the given information, it is quite hard for me to see what goes wrong.
However, I can say that the error: FileNotFoundError: [Errno 2] No such file or directory: 'data/moad_test.txt is caused by hard-coded paths to the training, validation and test lists. These lists are downloaded together with the code repository and can be found in the data/ subdirectory. Because the paths are hard-coded, process_bindingmoad.py should be run from the main directory (DiffSBDD/). Alternatively, you could change the paths in the script (here).

commented

Hello, @arneschneuing . I raised another question #18 ,can you answer it? I guess it may be caused by that reason.