mdcao / ConPLex_dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adapting Protein Language Models for Rapid DTI Prediction

This repository documents the code used to generate the results for our PNAS article. The updated package, which is continuously being developed, can be found at this repository. Please submit an issue or email samsl@mit.edu with any questions.

Sample Usage

python train_DTI.py --exp-id ExperimentName --config configs/default_config.yaml

Repository Organization

  • src: Python files containing protein and molecular featurizers, prediction architectures, and data loading
  • scripts: Bash files to run benchmarking tasks
    • CMD_BENCHMARK_DAVIS.sh -- Run DTI classification benchmarks on DAVIS data set. Can be easily modified for other classification data sets
    • CMD_BENCHMARK_TDC_DTI_DG.sh -- Run benchmarks for TDC DTI-DG regression task
    • CMD_BENCHMARK_DUDE_CROSSTYPE.sh -- Evaluate trained model on DUDe decoy performance for kinase and GPCR targets
    • CMD_BENCHMARK_DUDE_WITHINTYPE.sh -- The same as above, but with half of kinase, gpcr, protease, and nuclear targets
  • models: Pre-trained protein language models
  • dataset: Data sets to benchmark on, most are from MolTrans
    • DAVIS
    • BindingDB
    • BIOSNAP
    • DUDe
  • nb: Jupyter notebooks for data generation and exploration
  • train_DTI.py -- Main training script to run DTI classification benchmarks
  • DUDE_evaluate_decoys.py -- Compare predictions of a trained model between a target and known true binders/decoys. Visualize embedding space
  • DUDE_summarize_decoys.py -- Given a directory of protein targets, summarize active/decoy discriminative performance by target type

Reference

About

License:MIT License


Languages

Language:Jupyter Notebook 99.4%Language:Python 0.6%Language:Shell 0.0%