clinfo / ReTReK

ReTReK: data-driven ReTrosynthesis planning application using Retrosynthesis Knowledge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ReTReK: ReTrosynthesis planning application using Retrosynthesis Knowledge

This package provides a data-driven computer-aided synthesis planning tool using retrosynthesis knowledge. In this package, the model of ReTReK was trained with US Patent dataset instead of Reaxys reaction dataset. Hence, please note that we cannot guarantee that the model gives the same synthetic routes in the manuscript.

Note The pure Python version of ReTReK is available at https://github.com/clinfo/ReTReKpy

Dependancy

Environment (confirmed)

  • Ubuntu: 18.04 (model training & synthetic route prediction)
  • macOS Catalina: 10.15.7 (synthetic route prediction)

Package

Setup

Please refer to the following link.

Example usage

Note: The order of the knowledge arguments corresponds to that of the knowledge_weight arguments.

javac CxnUtils.java  # for the first time only

# use all knowledge
python run.py --config config/sample.json --target data/sample.mol --knowledge cdscore rdscore asscore stscore --knowledge_weights 1.0 1.0 1.0 1.0

# use CDScore with a weight of 2.0
python run.py --config config/sample.json --target data/sample.mol --knowledge cdscore --knowledge_weights 2.0 0.0 0.0 0.0

If you want to try your own molecule, prepare the molecule as MDL MOLfile format and replace data/sample.mol with the prepared file.

The target molecules used in the manuscript are stored in data/evaluation_compounds. If you want to try the molecules in the directory, run the command as follows:

NOTE: You need to download additional files using git-lfs to run the below command. At first, run git lfs install && git lfs pull to download data/starting_materials_zinc.smi.

python run.py --config config/sample2.json --target data/evaluation_compounds/drug-like-compounds/MtbTMPK_inhibitor.mol --knowledge cdscore --knowledge_weights 5.0 0.0 0.0 0.0 --sel_const 10 --expansion_num 500

python run.py --config config/sample2.json --target data/evaluation_compounds/drug-like-compounds/α7_nicotinic_acetylcholine_receptor_silent_agonist.mol --knowledge cdscore --knowledge_weights 5.0 0.0 0.0 0.0 --sel_const 10 --expansion_num 500

Optional arguments

  • --sel_const: constant value for selection (default value is set to 3).
  • --expansion_num: number of reaction templates used in the expansion step (default value is set to 50).
  • --starting_material: path to SMILES format file containing starting materials.
  • --search_count: the maximum number of iterations of MCTS (default value is set to 100).

Terms

Convergent Disconnection Score (CDScore)

CDScore aims to favor convergent synthesis, which is known as an efficient strategy in multi-step chemical synthesis.

Available Substances Score (ASScore)

For a similar purpose of CDScore, the number of available substances generated in a reaction step is calculated.

Ring Disconnection Score (RDScore)

A ring construction strategy is preferred if a target compounds has complex ring structures.

Selective Transformation Score (STScore)

A synthetic reaction with few by-products is generally preferred in terms of yield.

Contact

Reference

@article{Ishida2022,
  doi = {10.1021/acs.jcim.1c01074},
  url = {https://doi.org/10.1021/acs.jcim.1c01074},
  year = {2022},
  month = mar,
  publisher = {American Chemical Society ({ACS})},
  volume = {62},
  number = {6},
  pages = {1357--1367},
  author = {Shoichi Ishida and Kei Terayama and Ryosuke Kojima and Kiyosei Takasu and Yasushi Okuno},
  title = {{AI}-Driven Synthetic Route Design Incorporated with Retrosynthesis Knowledge},
  journal = {Journal of Chemical Information and Modeling}
}

This application is developed as part of a kGCN project.

About

ReTReK: data-driven ReTrosynthesis planning application using Retrosynthesis Knowledge

License:MIT License


Languages

Language:Pawn 76.4%Language:Python 19.4%Language:Java 2.7%Language:CSS 1.3%Language:HTML 0.2%