Environment

The most important python packages are:

python == 3.6.7
pytorch == 1.2.0
torch == 0.4.1
tensorboard == 1.13.1
rdkit == 2019.09.3
scikit-learn == 0.22.2.post1
hyperopt == 0.2.5
numpy == 1.18.2

For using our model more conveniently, we provide the environment file <environment.txt> to install environment directly.

Command

1. Train

Use train.py

Args:

data_path : The path of input CSV file. E.g. input.csv
dataset_type : The type of dataset. E.g. classification or regression
save_path : The path to save output model. E.g. model_save
log_path : The path to record and save the result of training. E.g. log

E.g.

python train.py --data_path data/test.csv --dataset_type classification --save_path model_save --log_path log

2. Predict

Use predict.py

Args:

predict_path : The path of input CSV file to predict. E.g. input.csv
result_path : The path of output CSV file. E.g. output.csv
model_path : The path of trained model. E.g. model_save/model.pt

E.g.

python predict.py --predict_path data/test.csv --model_path model_save/test.pt --result_path result.csv

3. Hyperparameters Optimization

Use hyper_opti.py

Args:

data_path : The path of input CSV file. E.g. input.csv
dataset_type : The type of dataset. E.g. classification or regression
save_path : The path to save output model. E.g. model_save
log_path : The path to record and save the result of hyperparameters optimization. E.g. log

E.g.

python hyper_opti.py --data_path data/test.csv --dataset_type classification --save_path model_save --log_path log

4. Interpretation of Fingerprints

Use interpretation_fp.py

Args:

predict_path : The path of input CSV file. E.g. input.csv
model_path : The path of trained model. E.g. model_save/model.pt
result_path : The path of result. E.g. result.txt

E.g.

python interpretation_fp.py --predict_path test.csv --model_path model_save/test.pt --result_path result.txt

5. Interpretation of Graph

Use interpretation_graph.py

Args:

predict_path : The path of input CSV file. E.g. input.csv
model_path : The path of trained model. E.g. model_save/model.pt
figure_path : The path to save figures of graph interpretation. E.g. figure

E.g.

python interpretation_graph.py --predict_path test.csv --model_path model_save/test.pt --figure_path figure

Data

We provide the three public benchmark datasets used in our study: <Data.rar>

Or you can use your own dataset:

1. For training

The dataset file should be a CSV file with a header line and label columns. E.g.

SMILES,BT-20
O(C(=O)C(=O)NCC(OC)=O)C,0
FC1=CNC(=O)NC1=O,0
...

2. For predicting

The dataset file should be a CSV file with a header line and without label columns. E.g.

SMILES
O(C(=O)C(=O)NCC(OC)=O)C
FC1=CNC(=O)NC1=O
...

3. For interpreting fingerprints

The dataset file should be a CSV file with a header line and label columns. E.g.

SMILES,BT-20
O(C(=O)C(=O)NCC(OC)=O)C,0
FC1=CNC(=O)NC1=O,0
...

4. For interpreting molecular graphs

The dataset file should be a CSV file with a header line and without label columns. E.g.

SMILES
O(C(=O)C(=O)NCC(OC)=O)C
FC1=CNC(=O)NC1=O
...

Example

1. Training a model on BACE dataset

Decompress the Data.rar and find BACE dataset file in Data/MoleculeNet/bace.csv.

Use command:

python train.py --data_path Data/MoleculeNet/bace.csv --dataset_type classification --save_path model_save/bace --log_path log/bace

2. Using trained model to predict new molecules (e.g. in test.csv)

The trained model is in model_save/bace/Seed_0/model.pt

Use command:

python predict.py --predict_path test.csv --model_path model_save/bace/Seed_0/model.pt --result_path result.csv

3. Interpreting fingerprints

Interpreting fingerprints should use the training data and the trained model

Use command:

python interpretation_fp.py --predict_path Data/MoleculeNet/bace.csv --model_path model_save/bace/Seed_0/model.pt --result_path result.txt

4. Interpreting molecular graphs

Interpreting molecular graphs with the specific molecules (e.g. in test.csv) and the trained model

Use command:

python interpretation_graph.py --predict_path test.csv --model_path model_save/bace/Seed_0/model.pt --figure_path figure/bace

mengtinghuang / FP-GNN

Environment

Command

1. Train

2. Predict

3. Hyperparameters Optimization

4. Interpretation of Fingerprints

5. Interpretation of Graph

Data

1. For training

2. For predicting

3. For interpreting fingerprints

4. For interpreting molecular graphs

Example

1. Training a model on BACE dataset

2. Using trained model to predict new molecules (e.g. in test.csv)

3. Interpreting fingerprints

4. Interpreting molecular graphs

About

Languages