Dan-Burns / ProDAR

Implementation of Protein Dynamically Activated Residues (ProDAR) for dyamics-informed protein function prediction/annotation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ProDAR

ProDAR enhances protien function prediction and extracts Dynamically Activated Residues (DARs) using the dynamical information obtained from normal mode analysis (NMA). The code is published with Encoding protein dynamic information in graph representation for functional residue identification.

[arXiv] [CRPS]

Hierarchy

├── data
│   ├── data-graphs.ipynb
│   ├── data-graphs.py
│   ├── data-sifts.ipynb
│   ├── data-sifts.py
│   ├── graphs-10A
│   ├── nma-anm
│   ├── pdbs
│   ├── pis
│   └── sifts
│       ├── mf_go_codes-allcnt.dat
│       ├── mf_go_codes-thres-50.dat
│       ├── mf_go_codes-thres-50.npy
│       ├── pdb_chains.dat
│       ├── pdbmfgos-thres-50.json
│       ├── sifts-err-1.log
│       └── sifts-err-2.log
├── datasets
│   └── dataset.py
├── evaluation_kfold.py
├── experiment_kfold.py
├── models
│   └── multilabel_classifiers
│       ├── GAT.py
│       ├── GCN.py
│       └── GraphSAGE.py
├── prodar-env.yml
└── prodar.py

Environment

  1. Clone environment from prodar-env.yml using miniconda:
conda env create -f environment.yml
  1. Install PyG package via pip wheel:
pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric

where ${TORCH} and ${CUDA} should be repalced by the PyTorch and CUDA version (TORCH=1.10.0 and CUDA=cu113 for this specific environment).

  1. Extra packages (if not installed by previous steps) may be installed via pip wheel.

Data

To preprocess data and generate protein graphs, execute the first script to download raw data from RCSB PDB search API and PDBe SIFTS API, and execute the second script to export filtered PDB and GO entries as JSON graphs.

  1. Execute data-sifts.py
python data-sifts.py
  1. Execute data-graphs.py
python data-graphs.py

For the above two steps, *.ipynb files are provided for markdown and optional visualization when jupyter lab/notebook is used.

Run

Experiment (currently only k-fold cross validation)

python experiment_kfold.py <options>

Evaluation (currently execute all saved models in history/)

python evaluation_kfold.py

Citing

If you happen to use the scripts, analyses, models, results or partial snippet of this work and find it useful, please cite the associated paper

@article{chiang2022encoding,
  title={Encoding protein dynamic information in graph representation for functional residue identification},
  author={Chiang, Yuan and Hui, Wei-Han and Chang, Shu-Wei},
  journal={Cell Reports Physical Science},
  volume={3},
  number={7},
  pages={100975},
  year={2022},
  publisher={Elsevier}
}

License

TBD

About

Implementation of Protein Dynamically Activated Residues (ProDAR) for dyamics-informed protein function prediction/annotation


Languages

Language:Python 71.0%Language:Jupyter Notebook 29.0%