Awesome Protein Structure Prediction and Design Software List
A collection of software for protein structure prediction and design, with a focus on new deep learning and transformer based tools.
Table of Contents
- Structure prediction
- Multimer structure prediction
- Design
- Peptide binding
- Other lists
- Uncurated searches
- Contribution guidelines
Structure prediction
-
Alphafold2 -
- paper
-
ColabFold - Colab notebooks - paper
- a collection of community-developed Colab notebooks with extra features for Alphafold2, ESMFold, RosettaFold, OmegaFold in various modes.
-
- a lightly modified fork of Alphafold2 which splits the pipeline into seperate MSA generation and GPU inference steps for better use of computing resources.
-
- a reimplementation of Alphafold2 using PyTorch
-
RoseTTAFold -
- paper
Multimer structure prediction
-
Uni-Fold Symmetry (UF-Symmetry) - paper
- an open-source reimplementation of Alphafold2 using PyTorch
-
AF2Complex -
- paper
-
- an Alphafold2-based pipeline for assembling large complexes based on pairwise heterodimer prediction and Monte Carlo search.
-
- Predicts interface residues from structure, for protein-protein, protein-DNA/RNA and protein-ligand interfaces.
-
- A T cell receptor:peptide-MHC docking protocol using an Alphafold model finetuned for TCR:peptide+MHC complexes.
Design
-
- Includes notebooks for AfDesign, TrDesign, ProteinMPNN
-
ProteinMPNN -
- paper
-
ESM methods
- ESM-IF1 (inverse folding) - paper
- ESMFold-based constraint based design via "Protein programming language" - paper -
- A "high-level programming language for generative protein design". Hopefully this method is given a catchier name for the peer-reviewed publication.
- ESM-2 language model design - paper -
|
- Fixed-backbone and free generative design, apparently capable of generalizing to produce sequences of folded proteins with no detectable sequence homoology with natural proteins.
-
- Fine-tunable model that predicts protein fitness/function from sequence. Can be used to prioritize variants when optimizing function based on existing data.
Peptide binding
-
AlphaFold encodes the principles to identify high affinity peptide binders (pre-print)
-
Solubility aware protein-binding peptide design with AfDesign -
- paper
- Based on ColabDesign/AfDesign, with an extra solubility objective function
Sequence generation
Other lists
- List of papers about Proteins Design using Deep Learning (Peldom)
- a huge well categorized list of methods, links to papers and code
- organized by machine learning method (LSTM, CNN, GAN, VAE, Transformer etc) and mapping (Sequence -> Scaffold, Function -> Structure etc).
- Sections: Benchmarks and datasets, Reviews, Model-based design, Function to Scaffold, Scaffold to Sequence, Function to Sequence, Function to Structure, Other tasks.
- Papers on machine learning for proteins (yangkky)
- a big well categorized list of papers.
- Sections: Reviews, Tools and datasets, Machine-learning guided directed evolution, Representation learning, Unsupervised variant prediction, Generative models, Biophysics predicting stability, Predicting structure from sequence, Predicting sequence from structure, Classification, annotation, search, and alignments, Predicting interactions with other molecules, Other supervised learning.
- Awesome AI-based Protein Design (opendilab)
- a list focusing on important peer-reviewed publications and manuscripts
- awesome-protein-design (johnnytam100)
Uncurated searches
- Github repos tagged protein-design
- results here may find their way into the curated list above ....
Contribution guidelines
- Should have a (theoretically) runnable implementation
- The focus of this list is runnable software rather than pre-print/publication descriptions of implementations, but link to the pre-print/paper if you can. There are several other great lists that focus on publications.
- Prefer open source and open access
- When linking to publications, please preference open access versions. If the peer-reviewed publication is not open access, please link to the aRxiv/bioRxiv version when available (aRxiv/bioRxiv generally provide outgoing links to the peer-reviewed version).
- Not a historical retrospective
- The intention is to include the best performing new implementations as they appear rather than be historically comprehensive (Andrej Sali's MODELLER was awesome, but probably obsolete at this point).