DS4SD / MolGrapher

MolGrapher: Graph-based Visual Recognition of Chemical Structures

Home Page:https://arxiv.org/abs/2308.12234

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MolGrapher

Huggingface Huggingface

This is the repository for MolGrapher: Graph-based Visual Recognition of Chemical Structures.

MolGrapher

Citation

If you find this repository useful, please consider citing:

@InProceedings{Morin_2023_ICCV,
    author = {Morin, Lucas and Danelljan, Martin and Agea, Maria Isabel and Nassar, Ahmed and Weber, Valery and Meijer, Ingmar and Staar, Peter and Yu, Fisher},
    title = {MolGrapher: Graph-based Visual Recognition of Chemical Structures},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month = {October},
    year = {2023},
    pages = {19552-19561}
}

Installation

Install MolGrapher

conda create -n molgrapher python=3.9
bash install_packages.sh
pip install -e .

Install MolDepictor

git clone https://github.com/DS4SD/MolDepictor.git
cd MolDepictor
pip install -e .

Install PaddleOCR On CPU,

bash install_paddleocr.sh -d 'cpu'

On GPU (tested on x86_64, Linux Ubuntu 20.04, CUDA 11.7),

bash install_paddleocr.sh -d 'gpu'

Model

Models are available on Hugging Face.

wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_gcn_model.ckpt
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_no_stereo_model.ckpt
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_stereo_model.ckpt
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/keypoint_detector/kd_model.ckpt

After downloading, the folder models from Hugging Face should be placed in: ./data/. Models can be selected by modifying attributes of GraphRecognizer (in ./molgrapher/models/graph_recognizer.py).

Inference

Your input images can be placed in the folder: ./data/benchmarks/default/.

bash molgrapher/scripts/annotate/run.sh

Output predictions are saved in: ./data/predictions/default/.

USPTO-30K Benchmark

USPTO-30K is available on Hugging Face.

  • USPTO-10K contains 10,000 clean molecules, i.e. without any abbreviated groups.
  • USPTO-10K-abb contains 10,000 molecules with superatom groups.
  • USPTO-10K-L contains 10,000 clean molecules with more than 70 atoms.

Synthetic Dataset

The synthetic dataset is available on Hugging Face. Images and graphs are generated using MolDepictor.

Training

To train the keypoint detector:

python3 ./molgrapher/scripts/train/train_keypoint_detector.py

To train the node classifier:

python3 ./molgrapher/scripts/train/train_graph_classifier.py

About

MolGrapher: Graph-based Visual Recognition of Chemical Structures

https://arxiv.org/abs/2308.12234

License:MIT License


Languages

Language:Python 98.5%Language:Shell 0.9%Language:Dockerfile 0.6%