songlab-cal / CPT

Cross-protein transfer learning for variant effect prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CPT

Cross-protein transfer learning for variant effect prediction

This repository contains the codes and data for reproducing main results from the manuscript "Cross-protein transfer learning substantially improves zero-shot prediction of disease variant effects".

analysis.ipynb: Jupyter notebook for the main analyses.

CPT/: Python files for models and utility functions.

data/: Data necessary to train and evaluate the models.

We also provide pre-computed CPT-1 scores for 18,602 human proteins at

  1. Zenodo
  2. Huggingface (an interactive app to visualize and download individual proteins)

If the user would like to generate whole-proteome predictions with the trained model by themselves, the feature matrices can be downloaded at: EVE set, no-EVE set.

Citation

Jagota, M.*, Ye, C.*, Albors, C., Rastogi, R., Koehl, A., Ioannidis, N., and Song, Y.S.†
"Cross-protein transfer learning substantially improves disease variant prediction", Genome Biology, 24, Article Number: 182 (2023).

*These authors contributed equally to this work.
†To whom correspondence should be addressed: yss@berkeley.edu

DOI: https://doi.org/10.1186/s13059-023-03024-6

About

Cross-protein transfer learning for variant effect prediction

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Jupyter Notebook 93.6%Language:Python 6.4%