Bioinformatis / hignn

Code for "HiGNN: A Hierarchical Informative Graph Neural Network for Molecular Property Prediction Equipped with Feature-Wise Attention"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

HiGNN is a well-designed hierarchical and interactive informative graph neural networks framework for predicting molecular property by utilizing a co-representation learning of molecular graphs and chemically synthesizable BRICS fragments. Meanwhile, a plug-and-play feature-wise attention block was first designed in HiGNN architecture to adaptively recalibrate atomic features after message passing phase. HiGNN has been accepted for publication in Journal of Chemical Information and Modeling. overview Fig.1 The overview of HiGNN

Requirements

This project is developed using python 3.7.10, and mainly requires the following libraries.

rdkit==2021.03.1
scikit_learn==1.1.1
torch==1.7.1+cu101
torch_geometric==1.7.1
torch_scatter==2.0.7

To install requirements:

pip install -r requirements.txt

Usage

File description

  1. source: the source code for HiGNN.
    • config.py
    • datasets.py
    • utils.py
    • model.py
    • loss.py
    • train.py
    • cross_validate.py
  2. configs: HiGNN used yacs for experimental configuration, where you can customize the relevant hyperparameters for each experiment with yaml file.
  3. data: the dataset for training.
    • raw: where to store the original csv dataset.
    • processed: the dataset objects generated by PyG.
  4. test: where the training logs, checkpoints and tensorboards are saved.
  5. example: jupyter notebook codes for fragments counting, t-SNE visualization, interpretation and so on.
  6. dataset: 11 real-world drug-discovery-related datasets used in this study.

Training example

Taking the BBBP dataset as an example, experiment can be run via:

git clone https://github.com/idruglab/hignn
cd ./hignn

# For one random seed
python ./source/train.py --cfg ./configs/bbbp/bbbp.yaml --opts 'SEED' 2022 'MODEL.BRICS' True 'MODEL.F_ATT' True --tag seed_2022

# For 10 different random seeds (2021~2030)
python ./source/cross_validate.py --cfg ./configs/bbbp/bbbp.yaml --opts 'MODEL.BRICS' True 'MODEL.F_ATT' True 'HYPER' False --tag 10_seeds

# For hyperparameters optimization
python ./source/cross_validate.py --cfg ./configs/bbbp/bbbp.yaml --opts 'MODEL.BRICS' True 'MODEL.F_ATT' True --tag hignn # HiGNN
python ./source/cross_validate.py --cfg ./configs/bbbp/bbbp.yaml --opts 'MODEL.F_ATT' True --tag w/o_hi # the variant (w/o HI)
python ./source/cross_validate.py --cfg ./configs/bbbp/bbbp.yaml --opts 'MODEL.BRICS' True --tag w/o_fa # the variant (w/o FA)
python ./source/cross_validate.py --cfg ./configs/bbbp/bbbp.yaml --tag vanilla # the variant (w/o All)

And more hyperparameter details can be found in config.py.

Interpretation

The interpretability of HiGNN can refer to interpretation_bace and interpretation_bbbp.

Data

  • The datasets used in this study are available in dataset or MoleculeNet.
  • The training logs, checkpoints, and tensorboards for each dataset can be found in BaiduNetdisk.

Results

In the present study, we evaluated the proposed HiGNN model on 11 commonly used and publicly available drug discovery-related datasets from Wu et al., including classification and regression tasks. According to previous studies, 14 learning tasks were designed based on 11 benchmark datasets, including 11 classification tasks based random- and scaffold-splitting methods and three regression tasks based on random-splitting method.

Table 1 Predictive performance results of HiGNN on the drug discovery-related benchmark datasets.

Dataset Split Type Metric Chemprop GCN GAT Attentive FP HRGCN+ XGBoost HiGNN
BACE random ROC-AUC 0.898 0.898 0.886 0.876 0.891 0.889 0.890
scaffold ROC-AUC 0.857 0.882
HIV random ROC-AUC 0.827 0.834 0.826 0.822 0.824 0.816 0.816
scaffold ROC-AUC 0.794 0.802
MUV random PRC-AUC 0.053 0.061 0.057 0.038 0.082 0.068 0.186
Tox21 random ROC-AUC 0.854 0.836 0.835 0.852 0.848 0.836 0.856
ToxCast random ROC-AUC 0.764 0.770 0.768 0.794 0.793 0.774 0.781
BBBP random ROC-AUC 0.917 0.903 0.898 0.887 0.926 0.926 0.932
scaffold ROC-AUC 0.886 0.927
ClinTox random ROC-AUC 0.897 0.895 0.888 0.904 0.899 0.911 0.930
SIDER random ROC-AUC 0.658 0.634 0.627 0.623 0.641 0.642 0.651
FreeSolv random RMSE 1.009 1.149 1.304 1.091 0.926 1.025 0.915
ESOL random RMSE 0.587 0.708 0.658 0.587 0.563 0.582 0.532
Lipo random RMSE 0.563 0.664 0.683 0.553 0.603 0.574 0.549

Acknowledgments

The code was partly built based on chemprop, TrimNet and Swin Transformer. Thanks a lot for their open source codes!

About

Code for "HiGNN: A Hierarchical Informative Graph Neural Network for Molecular Property Prediction Equipped with Feature-Wise Attention"

License:MIT License


Languages

Language:Jupyter Notebook 92.7%Language:Python 7.3%