larngroup / Explainable-Deep-DT-Representations

Explainable Deep Drug-Target Representations for Binding Affinity Prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Explainable Deep Drug-Target Representations for Binding Affinity Prediction

We explore the reliability of Convolutional Neural Networks (CNNs) in the identification of important regions for binding, and the significance of the deep representations by providing explanations to the model’s decisions based on the identification of the input regions that contributed the most to the prediction. Furthermore, we implement an end-to-end deep learning architecture to predict binding affinity, where CNNs are exploited in their capacity to automatically surmise and extract discriminating deep representations from 1D sequential and structural data.

End-to-End Deep Learning Architecture: Convolutional Neural Networks + Feed-Forward Fully Connected Neural Network

Chemogenomc Representative K-Fold

Regression Discriminative Localization Map

3D Docking Visualization

  • Potential Binding Sites (≤ 5 Å) : Green

  • L-Grad-RAM Hits : Blue

  • Matched Binding - L-Grad-RAM Hits : Red

ABL1(E255K)-phosphorylated - SKI-606

DDR1 - Foretinib

Binding Affinity Prediction Model

  • Two Parallel Convolution Neural Networks + Fully Connected Neural Network

Gradient-Weighted Regression Activation Mapping (Grad-RAM)

  • Global Max Pooling + Guided Gradients
  • Global Max Pooling + Non Guided Gradients
  • Global Average Pooling + Guided Gradients
  • Global Average Pooling + Non Guided Gradients

Davis Kinase Binding Affinity


  • davis_original_dataset: original dataset
  • davis_dataset_processed: dataset processed : prot sequences + rdkit SMILES strings + pkd values
  • deep_features_dataset: CNN deep representations: protein + SMILES deep representations


  • test_cluster: independent test set indices
  • train_cluster_X: train indices


  • protein_sw_score: protein Smith-Waterman similarity scores
  • protein_sw_score_norm: protein Smith-Waterman similarity normalized scores
  • smiles_ecfp6_tanimoto_sim: SMILES Morgan radius 3 similarity scores


  • davis_scpdb_binding: davis-scpdb matching pairs binding information


  • pssm_X: davis-scpdb matching pairs PSSM

sc-PDB Pairs


  • scpdb_binding: scpdb pairs binding information


  • pssm_X: scpdb pairs PSSM


  • davis_prot_dictionary: AA char-integer dictionary
  • davis_smiles_dictionary: SMILES char-integer dictionary

State-of-the-Art Baselines Data

Davis Kinase Binding Affinity Dataset + Clusters in the SOTA method format


  • abl1_pymol.pse: ABL1(E255K)-phosphorylated - SKI-606 PyMol Session
  • ddr1_pymol.pse: DDR1 - Foretinib PyMol Session


  • Python 3.7.9
  • Tensorflow 2.4.1
  • Numpy
  • Pandas
  • Scikit-learn
  • Itertools
  • Matplotlib
  • Seaborn
  • Glob
  • Json


Binding Affinity Prediction


python --option Training --num_cnn_layers_prot 3 --prot_filters 64 64 128 --prot_filters_w 4 4 5 --num_cnn_layers_smiles 3 --smiles_filters 64 64 128 --smiles_filters_w 4 4 5 --num_fcnn_layers 3 --fcnn_units 1024 512 1024 --drop_rate 0.5 0.1 --lr_rate 0.0001 


python --option Validation --num_cnn_layers_prot 3 --prot_filters 64 64 128 --prot_filters_w 4 4 5 --num_cnn_layers_smiles 3 --smiles_filters 64 64 128 --smiles_filters_w 4 4 5 --num_fcnn_layers 3 --fcnn_units 1024 512 1024 --drop_rate 0.5 0.1 --lr_rate 0.0001 


python --option Evaluation

Gradient-weighted Regression Activation Mapping (L-Grad-RAM)


  • Protein Sequence : MLEICLKLVG...
  • SMILES String : Cc1cn(...
  • Window Length : 0 1 2 ...
  • Feature Importance Threshold : 0.3 0.4 0.5 ...
  • Binding Sites Positions : 5 10 50 ...
python --protein_sequence MLEICLKLVG... --smiles_string Cc1cn(... --window 0 1 2 ... --thresholds 0.3 0.4 0.5 ... --sites 5 10 50 ...


Explainable Deep Drug-Target Representations for Binding Affinity Prediction


Language:Python 100.0%