MolGrapher
This is the repository for MolGrapher: Graph-based Visual Recognition of Chemical Structures.
Citation
If you find this repository useful, please consider citing:
@InProceedings{Morin_2023_ICCV,
author = {Morin, Lucas and Danelljan, Martin and Agea, Maria Isabel and Nassar, Ahmed and Weber, Valery and Meijer, Ingmar and Staar, Peter and Yu, Fisher},
title = {MolGrapher: Graph-based Visual Recognition of Chemical Structures},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {19552-19561}
}
Installation
Install MolGrapher
conda create -n molgrapher python=3.9
bash install_packages.sh
pip install -e .
Install MolDepictor
git clone https://github.com/DS4SD/MolDepictor.git
cd MolDepictor
pip install -e .
Install PaddleOCR On CPU,
bash install_paddleocr.sh -d 'cpu'
On GPU (tested on x86_64, Linux Ubuntu 20.04, CUDA 11.7),
bash install_paddleocr.sh -d 'gpu'
Model
Models are available on Hugging Face.
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_gcn_model.ckpt
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_no_stereo_model.ckpt
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_stereo_model.ckpt
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/keypoint_detector/kd_model.ckpt
After downloading, the folder models
from Hugging Face should be placed in: ./data/
.
Models can be selected by modifying attributes of GraphRecognizer (in ./molgrapher/models/graph_recognizer.py
).
Inference
Your input images can be placed in the folder: ./data/benchmarks/default/
.
bash molgrapher/scripts/annotate/run.sh
Output predictions are saved in: ./data/predictions/default/
.
USPTO-30K Benchmark
USPTO-30K is available on Hugging Face.
- USPTO-10K contains 10,000 clean molecules, i.e. without any abbreviated groups.
- USPTO-10K-abb contains 10,000 molecules with superatom groups.
- USPTO-10K-L contains 10,000 clean molecules with more than 70 atoms.
Synthetic Dataset
The synthetic dataset is available on Hugging Face. Images and graphs are generated using MolDepictor.
Training
To train the keypoint detector:
python3 ./molgrapher/scripts/train/train_keypoint_detector.py
To train the node classifier:
python3 ./molgrapher/scripts/train/train_graph_classifier.py