mdcao / Lingo3DMol

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lingo3DMol: Generation of a Pocket-based 3D Molecule using a Language Model

Lingo3DMol is a pocket-based 3D molecule generation method that combines the ability of language model with the ability to generate 3D coordinates and geometric deep learning to produce high-quality molecules.

System Requirements

Hardware requirements

A standard computer with GPU with at least 5GB graphic memory.

OS Requirements

This package is supported for macOS and Linux. The package has been tested on the following systems: Linux: Ubuntu 16.04 macOS: Ventura (13.0)

Install via conda yaml file

Typical install time: 40 minutes

conda create -n lingo3dmol python=3.8
conda activate lingo3dmol
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install scipy==1.7.3 pandas==1.5.1 numpy==1.20.3 rdkit==2022.09.1

Datasets

We provide DUD-E pocket files for sampling under \dataset folder. Please Unzipdude_pocket.zip.

Model Checkpoints

Download and move these checkpoint to the \checkpoint folder.

https://stonewise-lingo3dmol-public.s3.cn-northwest-1.amazonaws.com.cn/contact.pkl md5sum:6a9313726141fcf9201b9b9470dc2a7e

https://stonewise-lingo3dmol-public.s3.cn-northwest-1.amazonaws.com.cn/gen_mol.pkl md5sum:452bd401667184ae43c9818e5bdb133b

Sampling

To inference using the model on DUD-E set, run this code:

sh run.sh

Expected output

The output should be generated molecules in mol format.

Main Parameters Help

  • --coc_dis Define the collision distance with the pocket (A)
  • --nci_thrs Define the threshold of the nci prediction model
  • --topk Select top k nci
  • --max_run_hours Define the max run hours
  • --gennums Define the minimum generation numbers
  • --USE_THRESHOLD Define sampled only categories larger than expectation
  • --isMultiSample Define sampled use multinomial
  • --isGuideSample frag-based sampled use this definition
  • --OnceMolGen no frag-based sample
  • --gen_frag_set Define the number of collections generated, from which the top twenty percent is selected (batch size)
  • --prod_time "go_factory()" func repeat time
  • --tempture "proj1" used Tsoftmax
  • --frag_len_add Defines the minimum length of each generated fragment/ bigger fragments faster generated

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

About

License:GNU General Public License v3.0


Languages

Language:Python 99.8%Language:Shell 0.2%