Cross-protein transfer learning for variant effect prediction
This repository contains the codes and data for reproducing main results from the manuscript "Cross-protein transfer learning substantially improves zero-shot prediction of disease variant effects".
analysis.ipynb
: Jupyter notebook for the main analyses.
CPT/
: Python files for models and utility functions.
data/
: Data necessary to train and evaluate the models.
We also provide pre-computed CPT-1 scores for 18,602 human proteins at
- Zenodo
- Huggingface (an interactive app to visualize and download individual proteins)
If the user would like to generate whole-proteome predictions with the trained model by themselves, the feature matrices can be downloaded at: EVE set, no-EVE set.
Jagota, M.*, Ye, C.*, Albors, C., Rastogi, R., Koehl, A., Ioannidis, N., and Song, Y.S.†
"Cross-protein transfer learning substantially improves disease variant prediction", Genome Biology, 24, Article Number: 182 (2023).
*These authors contributed equally to this work.
†To whom correspondence should be addressed: yss@berkeley.edu