Explainable Deep Drug-Target Representations for Binding Affinity Prediction
We explore the reliability of Convolutional Neural Networks (CNNs) in the identification of important regions for binding, and the significance of the deep representations by providing explanations to the model’s decisions based on the identification of the input regions that contributed the most to the prediction. Furthermore, we implement an end-to-end deep learning architecture to predict binding affinity, where CNNs are exploited in their capacity to automatically surmise and extract discriminating deep representations from 1D sequential and structural data.
End-to-End Deep Learning Architecture: Convolutional Neural Networks + Feed-Forward Fully Connected Neural Network
Chemogenomc Representative K-Fold
Regression Discriminative Localization Map
3D Docking Visualization
-
Potential Binding Sites (≤ 5 Å) : Green
-
L-Grad-RAM Hits : Blue
-
Matched Binding - L-Grad-RAM Hits : Red
ABL1(E255K)-phosphorylated - SKI-606
DDR1 - Foretinib
Binding Affinity Prediction Model
- Two Parallel Convolution Neural Networks + Fully Connected Neural Network
Gradient-Weighted Regression Activation Mapping (Grad-RAM)
- Global Max Pooling + Guided Gradients
- Global Max Pooling + Non Guided Gradients
- Global Average Pooling + Guided Gradients
- Global Average Pooling + Non Guided Gradients
Davis Kinase Binding Affinity
Dataset
- davis_original_dataset: original dataset
- davis_dataset_processed: dataset processed : prot sequences + rdkit SMILES strings + pkd values
- deep_features_dataset: CNN deep representations: protein + SMILES deep representations
Clusters
- test_cluster: independent test set indices
- train_cluster_X: train indices
Similarity
- protein_sw_score: protein Smith-Waterman similarity scores
- protein_sw_score_norm: protein Smith-Waterman similarity normalized scores
- smiles_ecfp6_tanimoto_sim: SMILES Morgan radius 3 similarity scores
Binding
- davis_scpdb_binding: davis-scpdb matching pairs binding information
PSSM
- pssm_X: davis-scpdb matching pairs PSSM
sc-PDB Pairs
Binding
- scpdb_binding: scpdb pairs binding information
PSSM
- pssm_X: scpdb pairs PSSM
Dictionaries
- davis_prot_dictionary: AA char-integer dictionary
- davis_smiles_dictionary: SMILES char-integer dictionary
State-of-the-Art Baselines Data
Davis Kinase Binding Affinity Dataset + Clusters in the SOTA method format
Docking
- abl1_pymol.pse: ABL1(E255K)-phosphorylated - SKI-606 PyMol Session
- ddr1_pymol.pse: DDR1 - Foretinib PyMol Session
Requirements:
- Python 3.7.9
- Tensorflow 2.4.1
- Numpy
- Pandas
- Scikit-learn
- Itertools
- Matplotlib
- Seaborn
- Glob
- Json
Usage:
Binding Affinity Prediction
Training
python cnn_fcnn_model.py --option Training --num_cnn_layers_prot 3 --prot_filters 64 64 128 --prot_filters_w 4 4 5 --num_cnn_layers_smiles 3 --smiles_filters 64 64 128 --smiles_filters_w 4 4 5 --num_fcnn_layers 3 --fcnn_units 1024 512 1024 --drop_rate 0.5 0.1 --lr_rate 0.0001
Validation
python cnn_fcnn_model.py --option Validation --num_cnn_layers_prot 3 --prot_filters 64 64 128 --prot_filters_w 4 4 5 --num_cnn_layers_smiles 3 --smiles_filters 64 64 128 --smiles_filters_w 4 4 5 --num_fcnn_layers 3 --fcnn_units 1024 512 1024 --drop_rate 0.5 0.1 --lr_rate 0.0001
Evaluation
python cnn_fcnn_model.py --option Evaluation
Gradient-weighted Regression Activation Mapping (L-Grad-RAM)
Example
- Protein Sequence : MLEICLKLVG...
- SMILES String : Cc1cn(...
- Window Length : 0 1 2 ...
- Feature Importance Threshold : 0.3 0.4 0.5 ...
- Binding Sites Positions : 5 10 50 ...
python gradram_testing.py --protein_sequence MLEICLKLVG... --smiles_string Cc1cn(... --window 0 1 2 ... --thresholds 0.3 0.4 0.5 ... --sites 5 10 50 ...