This repository is inspired by the remarkable work of Kevin Kaichuang Yang and their outstanding project Machine-learning-for-proteins. We have established this repository to provide a specialized and focused platform for the field of Deep Learning for Protein Design, a rapidly advancing domain in computational biology.
Contributions and suggestions are warmly welcome!
Papers last week, updated on 2023.11.10:
- Advances in generative modeling methods and datasets to design novel enzymes for renewable chemicals and fuels
- AntiFold: Improved antibody structure design using inverse folding
- [GenBio@NeurIPS2023 Spotlight] • [code] • [colab]
- Sample-efficient Antibody Design through Protein Language Model for Risk-aware Batch Bayesian Optimization
- Amalga: Designable Protein Backbone Generation with Folding and Inverse Folding Guidance
0) Benchmarks and datasets
Sequence datasets •
Structure datasets •
Public database •
Similar list
1) Reviews and surveys
De novo design •
Antibody design •
Peptide design •
Binder design •
Enzyme design
2) Model-based design
trRosetta-based •
AlphaFold2-based •
DMPfold2-based •
CM-Align •
MSA transformer-based •
DeepAb-based •
TRFold2-based •
GPT-based •
ESM-based •
Sampling-algorithms
3) Function to Scaffold
GAN-based •
VAE-based •
DAE-based •
MLP-based •
Diffusion-based •
RL-based •
Flow-based
4) Scaffold to Sequence
MLP-based •
VAE-based •
LSTM-based •
CNN-based •
GNN-based •
GAN-based •
Transformer-based •
ResNet-based •
Diffusion-based •
Bayesian method •
Flow-based
5) Function to Sequence
CNN-based •
VAE-based •
GAN-based •
Transformer-based •
ResNet-based •
Bayesian method •
Reinforcement Learning •
Flow-based •
RNN-based •
LSTM-based •
Autoregressive •
Boltzmann machine •
Diffusion-based •
GNN-based •
Score-based
6) Function to Structure
LSTM-based •
Diffusion-based •
RoseTTAFold-based •
CNN-based •
GNN-based •
Transformer-based •
MLP-based
7) Other
Effects of mutations & Fitness Landscape •
Protein language models and representation learning •
Molecular Design Models
FLIP: Benchmark tasks in fitness landscape inference for proteins
Christian Dallago, Jody Mou, Kadina E Johnston, Bruce Wittmann, Nick Bhattacharya, Samuel Goldman, Ali Madani, Kevin K Yang
NeurIPS 2021 Datasets and Benchmarks Track/bioRxiv 2021 • website • code • supplementary
A Benchmark Framework for Evaluating Structure-to-Sequence Models for Protein Design
Jeffrey Chan, Seyone Chithrananda, David Brookes, Sam Sinai
Paper unavailable at Machine Learning in Structural Biology Workshop 2022
PDBench: Evaluating Computational Methods for Protein-Sequence Design
Leonardo V Castorina, Rokas Petrenas, Kartic Subr, Christopher W Wood
Bioinformatics, 2023;, btad027 • code
Benchmarking deep generative models for diverse antibody sequence design
Igor Melnyk, Payel Das, Vijil Chenthamarakshan, Aurelie Lozano
arXiv:2111.06801
The Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design
Chase Armer, Hassan Kane, Dana Cortade, Dave Estell, Adil Yusuf, Radhakrishna Sanka, Henning Redestig, TJ Brunette, Pete Kelly, Erika DeBenedictis
arXiv:2309.09955
Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks
Sean R.Johnson, Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, Kevin K. Yang
bioRxiv (2023) • code
AlphaDesign: A graph protein design method and benchmark on AlphaFoldDB
Zhangyang Gao, Cheng Tan, Stan Z. Li
arxiv (2022)
SidechainNet: An All-Atom Protein Structure Dataset for Machine Learning
Jonathan E. King, David Ryan Koes
arxiv • github::sidechainnet
TDC maintains a resource list that currently contains 22 tasks (and its datasets) related to small molecules and macromolecules, including PPI, DDI and so on. MoleculeNet published a small molecule related benchmark four years ago.
In terms of datasets and benchmarks, protein design is far less mature than drug discovery (paperwithcode drug discovery benchmarks). (Maybe should add the evaluation of protein design for deep learning method (especially deep generative model))
Difficulties and opportunities always coexist. Happy to see the work of Christian Dallago, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, Kevin K. Yang and Zhangyang Gao, Cheng Tan, Stan Z. Li.
Sampling of structure and sequence space of small protein folds
Linsky, T.W., Noble, K., Tobin, A.R. et al.
Nat Commun 13, 7151 (2022) • code • Supplementary
OpenProteinSet: Training data for structural biology at scale
Gustaf Ahdritz, Nazim Bouatta, Sachin Kadyan, Lukas Jarosch, Daniel Berenberg, Ian Fisk, Andrew M. Watkins, Stephen Ra, Richard Bonneau, Mohammed AlQuraishi
arXiv:2308.05326 • OpenFold
ProteinInvBench: Benchmarking Protein Design on Diverse Tasks, Models, and Metrics
Zhangyang Gao, Cheng Tan, Yijie Zhang, Xingran Chen, Stan Z. Li
GitHub
A list of suggested protein databases, more lists at CNCB.
Database | Description |
---|---|
PDB | The Protein Data Bank (PDB) is a database of 3D structural data of large biological molecules, such as proteins and nucleic acids. These data are gathered using experimental methods such as X-ray crystallography, NMR spectroscopy, or cryo-electron microscopy. |
AlphaFoldDB | AlphaFoldDB is a database of protein structure predictions produced by DeepMind's AlphaFold system. It provides highly accurate predictions of protein 3D structures. |
PDBbind | PDBbind is a comprehensive collection of the binding data of all types of biomolecular complexes in the PDB database. It is primarily used for the development and validation of computational methods for predicting molecular interactions. |
AB-Bind | AB-Bind is a database for antibody binding affinity data. It offers a curated set of experimental binding data and corresponding antibody-protein complex structures. |
AntigenDB | AntigenDB is a manually curated database of experimentally verified antigens that includes detailed information about the antigen, the source organism, and the associated antibodies. |
CAMEO | CAMEO (Continuous Automated Model EvaluatiOn) is a project for the automated evaluation of methods predicting macromolecular structure. It continuously assesses the performance of automated protein structure prediction servers. |
CAPRI | The Critical Assessment of PRediction of Interactions (CAPRI) is a community-wide experiment to evaluate protein-protein interaction prediction methods. |
PIFACE | PIFACE is a web server for the prediction of protein-protein interactions. It identifies potential interaction interfaces on protein surfaces. |
SAbDab | The Structural Antibody Database (SAbDab) is an automatically updated resource for the structural information of antibodies from the PDB. It allows for easy access to curated, annotated, and classified antibody structures. |
SKEMPI v2.0 | SKEMPI 2.0 is a database of experimental measurements of the change in binding free energy caused by mutations in protein-protein complexes. |
ProtCAD | ProtCAD is a suite of tools for the design and engineering of novel protein structures, sequences, and functions. It allows users to build and manipulate complex protein structures, generate and evaluate sequence libraries, and simulate mutational effects. ProtCAD is a suite of tools for the design and engineering of novel protein structures, sequences, and functions. It allows users to build and manipulate complex protein structures, generate and evaluate sequence libraries, and simulate mutational effects. |
Some similar GitHub lists that include papers about protein design using deep learning:
- design_tools
- awesome-AI-based-protein-design
- ProteinStructureWithDL
- List of available bioinformatic tools and services
Protein design: from computer models to artificial intelligence
Paladino, Antonella, et al.
Wiley Interdisciplinary Reviews: Computational Molecular Science 7.5 (2017): e1318
Advances in protein structure prediction and design
Kuhlman, B., Bradley, P.
Nat Rev Mol Cell Biol 20, 681–697 (2019)
Deep learning in protein structural modeling and design
Wenhao Gao, Sai Pooja Mahajan, Jeremias Sulam, and Jeffrey J. Gray
Patterns 1.9 • 2020
100th anniversary of macromolecular science viewpoint: Data-driven protein design
Ferguson, Andrew L., and Rama Ranganathan.
ACS Macro Letters 10.3 (2021)
Artificial intelligence in early drug discovery enabling precision medicine
Boniolo, Fabio, et al.
Expert Opinion on Drug Discovery 16.9 (2021)
Protein design with deep learning
Defresne, Marianne, Sophie Barbe, and Thomas Schiex.
International Journal of Molecular Sciences 22.21 (2021)
Protein sequence design with deep generative models
Zachary Wu, Kadina E. Johnston, Frances H. Arnold, Kevin K. Yang
Current Opinion in Chemical Biology 65 • note • 2021
Structure-based protein design with deep learning
Ovchinnikov, Sergey, and Po-Ssu Huang.
Current opinion in chemical biology 65 • note • 2021
Deep learning techniques have significantly impacted protein structure prediction and protein design
Pearce, Robin, and Yang Zhang.
Current opinion in structural biology 68 (2021)
Recent advances in de novo protein design: Principles, methods, and applications
Pan, Xingjie, and Tanja Kortemme.
Journal of Biological Chemistry 296 (2021)
Protein design via deep learning
Wenze Ding, Kenta Nakai, Haipeng Gong
Briefings in Bioinformatics • 25 March 2022
Deep generative modeling for protein design
Strokach, Alexey, and Philip M. Kim.
Current Opinion in Structural Biology • 2022
Deep learning approaches for conformational flexibility and switching properties in protein design
Rudden, Lucas SP, Mahdi Hijazi, and Patrick Barth
Frontiers in Molecular Biosciences
Computational protein design with evolutionary-based and physics-inspired modeling: current and future synergies
Cyril Malbranke, David Bikard, Simona Cocco, Rémi Monasson, Jérôme Tubiana
arXiv:2208.13616v2
From sequence to function through structure: deep learning for protein design
Noelia Ferruz, Michael Heinzinger, Mehmet Akdel, Alexander Goncearenco, Luca Naef, Christian Dallago
bioRxiv 2022.08.31.505981/Computational and Structural Biotechnology Journal
Volume 21, 2023 • Supplementary • accompanying list
Computational protein design with data-driven approaches: Recent developments and perspectives
Liu, H, Chen, Q.
WIREs Comput Mol Sci. 2022. e1646
Understanding by design: Implementing deep learning from protein structure prediction to protein design
Gao, Yuanxu, Jiangshan Zhan, and Albert CH Yu.
MedComm–Future Medicine 1.2 (2022): e22
Diffusion Models in Bioinformatics: A New Wave of Deep Learning Revolution in Action
Zhiye Guo, Jian Liu, Yanli Wang, Mengrui Chen, Duolin Wang, Dong Xu, Jianlin Cheng
arXiv:2302.10907
Machine learning for evolutionary-based and physicsinspired protein design: Current and future synergies
Cyril Malbranke, David Bikard, Simona Cocco, Rémi Monasson, Jérôme Tubiana
Current Opinion in Structural Biology
De novo design of polyhedral protein assemblies: before and after the AI revolution
Bhoomika Basu Mallik, Jenna Stanislaw, Tharindu Madhusankha Alawathurage, and Alena Khmelinskaia
ChemBioChem 2023, e202300117
Research progress of artificial intelligence in protein design
CHEN Zhihang, JI Menglin, QI Yifei
Synthetic Biology Journal (2023)
A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material
Mengchun Zhang, Maryam Qamar, Taegoo Kang, Yuna Jung, Chenshuang Zhang, Sung-Ho Bae, Chaoning Zhang
https://arxiv.org/abs/2304.01565
Exploring the Protein Sequence Space with Global Generative Models
Sergio Romero-Romero, Sebastian Lindner, Noelia Ferruz
arXiv:2305.01941
The Era of Machine Learning for Protein Design, Summarized in Four Key Methods
LucianoSphere
Towards Data Science
Is novelty predictable?
Clara Fannjiang, Jennifer Listgarten
arXiv:2306.00872
Computational protein design – where it goes?
Xu Binbin, Chen Yingjun and Xue Weiwei
Current Medicinal Chemistry 2023
How can the protein design community best support biologists who want to harness AI tools for protein structure prediction and design?
Höcker, Birte, et al.
Cell Systems 14.8 (2023)
De novo 設計ナノポアの創製
新津藍
生物工学会誌 101.8 (2023)
Generative artificial intelligence for de novo protein design
Adam Winnifrith, Carlos Outeiral, Brian Hie
arXiv:2310.09685
Generative models for protein sequence modeling: recent advances and future directions
Mehrsa Mardikoraem, Zirui Wang, Nathaniel Pascual, Daniel Woldring
Briefings in Bioinformatics
A review of deep learning methods for antibodies
Graves, Jordan, et al.
Antibodies 9.2 (2020)
Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies
Akbar, Rahmad, et al.
Mabs. Vol. 14. No. 1. Taylor & Francis, 2022
Advances in computational structure-based antibody design
Hummer, Alissa M., Brennan Abanades, and Charlotte M. Deane.
Current Opinion in Structural Biology 74 (2022)
Computational and artificial intelligence-based methods for antibody development
Kim, Jisun, et al.
Trends in Pharmacological Sciences (2023)
Leveraging deep learning to improve vaccine design
Hederman AP, Ackerman ME
Trends in immunology (2023)
In Silico Approaches to Deliver Better Antibodies by Design: The Past, the Present and the Future
Andreas Evers, Shipra Malhotra, Vanita D. Sood
arXiv:2305.07488
AI Models for Protein Design are Driving Antibody Engineering
Michael Chungyoun, Jeffrey J. Gray
Current Opinion in Biomedical Engineering (2023): 100473
Computational Methods in Immunology and Vaccinology: Design and Development of Antibodies and Immunogens
Federica Guarra and Giorgio Colombo
Journal of Chemical Theory and Computation (2023)
Simplifying complex antibody engineering using machine learning
Makowski, Emily K., Hsin-Ting Chen, and Peter M. Tessier.
Cell Systems 14.8 (2023)/2022 AIChE Annual Meeting. AIChE, 2022.
AI driven B-cell Immunotherapy Design
Bruna Moreira da Silva, David B. Ascher, Nicholas Geard, Douglas E. V. Pires
arXiv:2309.01122
Deep generative models for peptide design
Wan, Fangping, Daphne Kontogiorgos-Heintz, and Cesar de la Fuente-Nunez
Digital Discovery (2022)
Design of protein segments and peptides for binding to protein targets
Gupta, Suchetana, Noora Azadvari, and Parisa Hosseinzadeh.
BioDesign Research 2022 (2022)
Improving de novo Protein Binder Design with Deep Learning
Nathaniel Bennett, Brian Coventry, Inna Goreshnik, Buwei Huang, Aza Allen, Dionne Vafeados, Ying Po Peng, Justas Dauparas, Minkyung Baek, Lance Stewart, Frank DiMaio, Steven De Munck, Savvas Savvides, David Baker
bioRxiv 2022.06.15.495993/Nat Commun 14, 2625 (2023) • code • news
A review of enzyme design in catalytic stability by artificial intelligence
Yongfan Ming, Wenkang Wang, Rui Yin, Min Zeng, Li Tang, Shizhe Tang, Min Li
Briefings in Bioinformatics, 2023
Application of "foldability" in the intelligent of enzymes engineering and design: take AlphaFold2 for example
MENG Qiaozhen, GUO Fei
Synthetic Biology Journal (2023)
AlphaFold2 and Deep Learning for Elucidating Enzyme Conformational Flexibility and Its Application for Design
Casadevall, Guillem, Cristina Duran, and Sílvia Osuna.
JACS Au (2023)
Accelerating Biocatalysis Discovery with Machine Learning: A Paradigm Shift in Enzyme Engineering, Discovery, and Design
Braun Markus, Gruber Christian C, Krassnigg Andreas, Kummer Arkadij, Lutz Stefan, Oberdorfer Gustav, Siirola Elina, and Snajdrova Radka
ACS Catal. 2023
Building Enzymes through Design and Evolution
Hossack, Euan J., Florence J. Hardy, and Anthony P. Green.
ACS Catalysis 13.19 (2023)
Advances in generative modeling methods and datasets to design novel enzymes for renewable chemicals and fuels
Rana A Barghout, Zhiqing Xu, Siddharth Betala, Radhakrishnan Mahadevan
Current Opinion in Biotechnology, Volume 84, 2023
Invert trained models with optimize algorithms through iterations for sequence design. Inverted structure prediction models are known as Hallucination.
Design of proteins presenting discontinuous functional sites using deep learning
Doug Tischer, Sidney Lisanza, Jue Wang, Runze Dong, View ORCID ProfileIvan Anishchenko, Lukas F. Milles, Sergey Ovchinnikov, David Baker
bioRxiv (2020)
Fast differentiable DNA and protein sequence optimization for molecular design
Linder, Johannes, and Georg Seelig.
arXiv preprint arXiv:2005.11275 (2020)
De novo protein design by deep network hallucination
Ivan Anishchenko, Samuel J. Pellock, Tamuka M. Chidyausiku, Theresa A. Ramelot, Sergey Ovchinnikov, Jingzhou Hao, Khushboo Bafna, Christoffer Norn, Alex Kang, Asim K. Bera, Frank DiMaio, Lauren Carter, Cameron M. Chow, Gaetano T. Montelione & David Baker
Nature (2021) • code • trRosetta
Protein sequence design by conformational landscape optimization
Norn, Christoffer, et al.
Proceedings of the National Academy of Sciences 118.11 (2021) • code
De novo design of small beta barrel proteins
David E. Kim and Davin R. Jensen and David Feldman and Doug Tischer and Ayesha Saleem and Cameron M. Chow and Xinting Li and Lauren Carter and Lukas Milles and Hannah Nguyen and Alex Kang and Asim K. Bera and Francis C. Peterson and Brian F. Volkman and Sergey Ovchinnikov and David Baker
PNAS(2023),e2207974120 • code
Exploring "dark matter" protein folds using deep learning
Zander Harteveld, Alexandra Van Hall-Beauvais, Irina Morozova, Joshua Southern, Casper Alexander Goverde, Sandrine Georgeon, Stephane Rosset, Andreas Loukas, Pierre Vandergheynst, Michael Bronstein, Bruno Correia
bioRxiv 2023.08.30.555621 • Suppplymentary • code
Solubility-aware protein binding peptide design using AlphaFold
Takatsugu Kosugi, Masahito Ohue
bioRxiv 2022.05.14.491955 • Supplemental Materials
End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman
Petti, Samantha, Bhattacharya, Nicholas, Rao, Roshan, Dauparas, Justas, Thomas, Neil, Zhou, Juannan, Rush, Alexander M, Koo, Peter K, Ovchinnikov, Sergey
bioRxiv (2021)/Bioinformatics, 2022;, btac724 • ColabDesign, SMURF, AF2 back propagation • our notes1, notes2 • lecture1, lecture2 • Discord
AlphaDesign: A de novo protein design framework based on AlphaFold
Jendrusch, Michael, Jan O. Korbel, and S. Kashif Sadiq.
bioRxiv (2021)
Using AlphaFold for Rapid and Accurate Fixed Backbone Protein Design
Moffat, Lewis, Joe G. Greener, and David T. Jones.
bioRxiv (2021)
State-of-the-art estimation of protein model accuracy using AlphaFold
James P. Roney, Sergey Ovchinnikov
bioRxiv 2022.03.11.484043/Physical Review Letters 129.23 (2022) • code
Hallucinating protein assemblies
Basile I M Wicky, Lukas F Milles, Alexis Courbet, Robert J Ragotte, Justas Dauparas, Elias Kinfu, Sam Tipps, Ryan D Kibler, Minkyung Baek, Frank DiMaio, Xinting Li, Lauren Carter, Alex Kang, Hannah Nguyen, Asim K Bera, David Baker
bioRxiv 2022.06.09.493773/Science (2022) • related slides • our notes • news
EvoBind: in silico directed evolution of peptide binders with AlphaFold
Patrick Bryant, Arne Elofsson
bioRxiv 2022.07.23.501214 • code
Hallucination of closed repeat proteins containing central pockets
Linna An, Derrick R Hicks, Dmitri Zorine, Justas Dauparas, Basile I. M. Wicky, Lukas F Milles, Alexis Courbet, Asim K. Bera, Hannah Nguyen, Alex Kang, Lauren Carter, David Baker
bioRxiv 2022.09.01.506251
Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search
Bryant, Patrick, et al.
Nature communications 13.1 (2022) • gitlba, github • Supplementary data1, Supplementary data2
De novo protein design by inversion of the AlphaFold structure prediction network
Casper Goverde, Benedict Wolf, Hamed Khakzad, Stephane Rosset, Bruno E Correia
bioRxiv 2022.12.13.520346 • code • lecture1 • lecture2
Code of OpenComplex
Jingcheng, Yu and Zhaoming, Chen and Zhaoqun, Li and Mingliang, Zeng and Wenjun, Lin and He, Huang and Qiwei, Ye
code
Efficient and scalable de novo protein design using a relaxed sequence space
Christopher Josef Frank, Ali Khoshouei, Yosta de Stigter, Dominik Schiewitz, Shihao Feng, Sergey Ovchinnikov, Hendrik Dietz
bioRxiv 2023.02.24.529906 • code
Cyclic peptide structure prediction and design using AlphaFold
Stephen A. Rettie, Katelyn V. Campbell, Asim K. Bera, Alex Kang, Simon Kozlov, Joshmyn De La Cruz, Victor Adebomi, Guangfeng Zhou, Frank DiMaio, Sergey Ovchinnikov, Gaurav Bhardwaj
bioRxiv • Code • Supplementary
De novo design of luciferases using deep learning
Andy Hsien-Wei Yeh, Christoffer Norn, Yakov Kipnis, Doug Tischer, Samuel J. Pellock, Declan Evans, Pengchen Ma, Gyu Rie Lee, Jason Z. Zhang, Ivan Anishchenko, Brian Coventry, Longxing Cao, Justas Dauparas, Samer Halabiya, Michelle DeWitt, Lauren Carter, K. N. Houk & David Baker
Nature • Code • Supplementary Materials
In silico evolution of protein binders with deep learning models for structure prediction and sequence design
Odessa J Goudy, Amrita Nallathambi, Tomoaki Kinjo, Nicholas Randolph, Brian Kuhlman
bioRxiv 2023.05.03.539278 • Supplementary • code
Computational design of soluble analogues of integral membrane protein structures
Casper Alexander Goverde, Martin Pacesa, Lars Jeremy Dornfeld, Sandrine Georgeon, Stephane Rosset, Justas Dauparas, Christian Shellhaas, Simon Kozlov, David Baker, Sergey Ovchinnikov, Bruno Correia
bioRxiv 2023.05.09.540044 • code • Supplementary
Antibody Complementarity-Determining Region Sequence Design using AlphaFold2 and Binding Affinity Prediction Model
Takafumi Ueki, Masahito Ohue
bioRxiv 2023.06.02.543382
Context-Dependent Design of Induced-fit Enzymes using Deep Learning Generates Well Expressed, Thermally Stable and Active Enzymes
Lior Zimmerman, Noga Alon, Itay Levin, Anna Koganitsky, Nufar Shpigel, Chen Brestel, Gideon David Lapidoth
bioRxiv 2023.07.27.550799 • Supplementary
Highly accurate and robust protein sequence design with CarbonDesign
Milong Ren, Chungong Yu, Dongbo Bu, Haicang Zhang
bioRxiv 2023.08.07.552204
Design of Cyclic Peptides Targeting Protein-Protein Interactions using AlphaFold
Takatsugu Kosugi, Masahito Ohue
bioRxiv 2023.08.20.554056 • Supplementary • code
MetaPPI: In Silico Screen for Novel CRBN-based Substrates
neoxbio
website • news • masif-based • commercial
Hallucination of closed repeat proteins containing central pockets
An, L., Hicks, D.R., Zorine, D. et al.
Nat Struct Mol Biol (2023) • code
AlphaFold Distillation for Protein Design
Anonymous
ICLR 2024 under review • code
Protein Language Model Supervised Precise and Efficient Protein Backbone Design Method
Bo Zhang, Kexin Liu, Zhuoqi Zheng, Yunfeiyang Liu, Junxi Mu, Ting Wei, Hai-Feng Chen
bioRxiv 2023.10.26.564121 • code • Supplementary
Design in the DARK: Learning Deep Generative Models for De Novo Protein Design
Moffat, Lewis, Shaun M. Kandathil, and David T. Jones.
bioRxiv (2022) • DMPfold2
AutoFoldFinder: An Automated Adaptive Optimization Toolkit for De Novo Protein Fold Design
Shuhao Zhang, Youjun Xu, Jianfeng Pei, Luhua Lai
NeurIPS 2021
Protein language models trained on multiple sequence alignments learn phylogenetic relationships
Damiano Sgarbossa, Umberto Lupo, Anne-Florence Bitbol
arXiv preprint arXiv:2203.15465 (2022)/bioRxiv 2022.04.14.488405
EvoOpt: an MSA-guided, fully unsupervised sequence optimization pipeline for protein design
Hideki Yamaguchi, Yutaka Saito
NeurIPS 2022
Generative power of a protein language model trained on multiple sequence alignments
Sgarbossa, Damiano, Umberto Lupo, and Anne-Florence Bitbol
Elife 12 (2023): e79854 • code
Towards deep learning models for target-specific antibody design
Mahajan, Sai Pooja, et al.
Biophysical Journal 121.3 (2022) • DeepAb • lecture
Hallucinating structure-conditioned antibody libraries for target-specific binders
Sai Pooja Mahajan, Jeffrey A Ruffolo, Rahel Frick, Jeffrey J. Gray
bioRxiv 2022.06.06.494991/Front. Immunol. 13:999034 • Supplementary • code
News of TRDesign
TIANRANG XLab
paper unavailable • slides • website • commercial • news
Multi-segment preserving sampling for deep manifold sampler
Berenberg, Daniel, et al.
arXiv preprint arXiv:2205.04259 (2022)
A high-level programming language for generative protein design
Brian Hie, Salvatore Candido, Zeming Lin, Ori Kabeli, Roshan Rao, Nikita Smetanin, Tom Sercu, Alexander Rives
bioRxiv 2022.12.21.521526
Language models generalize beyond natural proteins
Robert Verkuil, Ori Kabeli, Yilun Du, Basile IM Wicky, Lukas F Milles, Justas Dauparas, David Baker, Sergey Ovchinnikov, Tom Sercu, Alexander Rives
bioRxiv 2022.12.21.521521
ESMFold Hallucinates Native-Like Protein Sequences
Jeliazko R Jeliazkov, Diego del Alamo, Joel D Karpiak
bioRxiv 2023.05.23.541774
AdaLead: A simple and robust adaptive greedy search algorithm for sequence design
Sinai, Sam, et al.
arXiv preprint arXiv:2010.02141 (2020) • code
Autofocused oracles for model-based design
Fannjiang, Clara, and Jennifer Listgarten.
Advances in Neural Information Processing Systems 33 (2020)
An Efficient MCMC Approach to Energy Function Optimization in Protein Structure Prediction
Lakshmi A. Ghantasala, Risi Jaiswal, Supriyo Datta
arXiv:2211.03193
Plug & Play Directed Evolution of Proteins with Gradient-based Discrete MCMC
Patrick Emami, Aidan Perreault, Jeffrey Law, David Biagioni, Peter St. Joh
NeurIPS 2022/arXiv:2212.09925
Importance Weighted Expectation-Maximization for Protein Sequence Design
Zhenqiao Song, Lei Li
arXiv:2305.00386 • Supplementary
Simultaneous enhancement of multiple functional properties using evolution-informed protein design
Fram, Benjamin, et al.
bioRxiv (2023): 2023-05
Optimizing protein fitness using Gibbs sampling with Graph-based Smoothing
Andrew Kirjner, Jason Yim, Raman Samusevich, Tommi Jaakkola, Regina Barzilay, Ila Fiete
arXiv:2307.00494 • code
These models design backbone/scaffold/template in Cartesian coordinates, contact maps, distance maps and φ & ψ angles.
Generative modeling for protein structures
Anand, Namrata, and Possu Huang.
NeurIPS 2018
Fully differentiable full-atom protein backbone generation
Anand Namrata, Raphael Eguchi, and Po-Ssu Huang.
OpenReview ICLR 2019 workshop DeepGenStruct • without code
RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network
Sabban, Sari, and Mikhail Markovsky.
F1000Research 9 (2020) • code • pyRosetta • tensorflow • maximizaing the fluorescence of a protein
A Generative Model for Creating Path Delineated Helical Proteins
Nicholas B. Woodall, Ryan Kibler, Basile Wicky, Brian Coventry
bioRxiv 2023.05.24.542095 • code
Conditioning by adaptive sampling for robust design
Brookes, David, Hahnbeom Park, and Jennifer Listgarten.
International conference on machine learning. PMLR, 2019 • without code
IG-VAE: generative modeling of immunoglobulin proteins by direct 3D coordinate generation
Raphael R. Eguchi, Christian A. Choe, Po-Ssu Huang
Biorxiv (2020) • without code •
Generating tertiary protein structures via an interpretative variational autoencoder
Guo, Xiaojie, et al
arXiv preprint arXiv:2004.07119 (2020) • code not available
Deep sharpening of topological features for de novo protein design
Harteveld, Zander, et al.
ICLR2022 Machine Learning for Drug Discovery. 2022 • code not available
End-to-End deep structure generative model for protein design
Boqiao Lai, matthew McPartlon, Jinbo Xu
bioRxiv 2022.07.09.499440
Deep Generative Design of Epitope-Specific Binding Proteins by Latent Conformation Optimization
Raphael R Eguchi, Christian A Choe, Udit Parekh, Irene S Khalek, Michael D Ward, Neha Vithani, Gregory R Bowman, Joseph G Jardine, Possu Huang
bioRxiv 2022.12.22.521698
Function-guided protein design by deep manifold sampling
Vladimir Gligorijevic, Stephen Ra, Daniel Berenberg, Richard Bonneau, Kyunghyun Cho
NeurIPS 2021 • without code
A backbone-centred energy function of neural networks for protein design
Huang, B., Xu, Y., Hu, X. et al
Nature (2022) • code
ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a protein language diffusion model
Bo Ni and David L. Kaplan and M. Buehler
arXiv:2310.10605 • Supplementary • code
Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem
Brian L. Trippe, Jason Yim, Doug Tischer, Tamara Broderick, David Baker, Regina Barzilay, Tommi Jaakkola
arXiv:2206.04119/NeurIPS 2022/ICLR 2023 • poster • Supplementary • code
ProteinSGM: Score-based generative modeling for de novo protein design
Jin Sub Lee, Philip M Kim
bioRxiv 2022.07.13.499967/Nat Comput Sci (2023) • code
Protein structure generation via folding diffusion
Kevin E. Wu, Kevin K. Yang, Rianne van den Berg, James Y. Zou, Alex X. Lu, Ava P. Amini
arXiv:2209.15611 • code
DiffSDS: A language diffusion model for protein backbone inpainting under geometric conditions and constraints
Zhangyang Gao, Cheng Tan, Stan Z. Li
arXiv:2301.09642
Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds
Yeqing Lin, Mohammed AlQuraishi
arXiv:2301.12485v3 • code • news
SE(3) diffusion model with application to protein backbone generation
Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, Tommi Jaakkola
arXiv:2302.02277/ICLR 2023 • code • Supplementary
A Latent Diffusion Model for Protein Structure Generation
Cong Fu, Keqiang Yan, Limei Wang, Wing Yee Au, Michael McThrow, Tao Komikado, Koji Maruhashi, Kanji Uchino, Xiaoning Qian, Shuiwang Ji
arXiv:2305.04120
Practical and Asymptotically Exact Conditional Sampling in Diffusion Models
Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, John P. Cunningham
arXiv:2306.17775 • code
Dynamics-Informed Protein Design with Structure Conditioning
Simon V. Mathis, Urszula Julia Komorowska, Mateja Jamnik, Pietro Lió
WCBICML2023/ICLR 2024 under review
DiffSDS: A geometric sequence diffusion model for protein backbone inpainting
Anonymous
ICLR 2024 under review
Top-down design of protein nanomaterials with reinforcement learning
Isaac D Lutz, Shunzhi Wang, Christoffer Norn, Andrew J Borst, Yan Ting Zhao, Annie Dosey, Longxing Cao, Zhe Li, Minkyung Baek, Neil P King, Hannele Ruohola-Baker, David Baker
bioRxiv 2022.09.25.509419/Science380, 266-273(2023) • code,code2
SE(3)-Stochastic Flow Matching for Protein Backbone Generation
Avishek Joey Bose, Tara Akhound-Sadegh, Kilian Fatras, Guillaume Huguet, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, Alexander Tong
arXiv:2310.02391/ICLR 2024 under review
Fast protein backbone generation with SE(3) flow matching
Jason Yim, Andrew Campbell, Andrew Y. K. Foong, Michael Gastegger, José Jiménez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Regina Barzilay, Tommi Jaakkola, Frank Noé
arXiv:2310.05297 • code
Identify amino sequence from given backbone/scaffold/template constrains: torsion angles(φ & ψ), backbone angles(θ and τ), backbone dihedrals (φ, ψ & ω), backbone atoms (Cα, N, C, & O), Cα − Cα distance, unit direction vectors of Cα−Cα, Cα−N & Cα−C, etc(aka. inverse folding). Referred from here. Energy-based models are also inculded for task of rotamer conformation(χ angles or atom coordinates) recovery.
3D representations of amino acids—applications to protein sequence comparison and classification
Li, Jie, and Patrice Koehl.
Computational and structural biotechnology journal 11.18 (2014) • 2014
Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment‐based local and energy‐based nonlocal profiles
Li, Zhixiu, et al.
Proteins: Structure, Function, and Bioinformatics 82.10 (2014) • code unavailable
SPIN2: Predicting sequence profiles from protein structures using deep neural networks
O'Connell, James, et al.
Proteins: Structure, Function, and Bioinformatics 86.6 (2018) • code unavailable
Computational protein design with deep learning neural networks
Wang, Jingxue, et al.
Scientific reports 8.1 (2018) • code unavailable
Ligand-aware protein sequence design using protein self contacts
Jody Mou, Benjamin Fry, Chun-Chen Yao, Nicholas Polizzi
NeurIPS 2022
SeqPredNN: a neural network that generates protein sequences that fold into specified tertiary structures
Lategan, F. Adriaan, Caroline Schreiber, and Hugh G. Patterton.
BMC bioinformatics 24.1 (2023) • code
Design of metalloproteins and novel protein folds using variational autoencoders
Greener, Joe G., Lewis Moffat, and David T. Jones.
Scientific reports 8.1 (2018)
To improve protein sequence profile prediction through image captioning on pairwise residue distance map
Chen, Sheng, et al.
Journal of chemical information and modeling 60.1 (2019) • SPROF
Deep learning of Protein Sequence Design of Protein-protein Interactions
Syrlybaeva, Raulia, and Eva-Maria Strauch.
bioRxiv (2022)/Bioinformatics, 2022;, btac733 • Supplementary • code
A structure-based deep learning framework for protein engineering
Shroff, Raghav, et al.
bioRxiv (2019)
ProDCoNN: Protein design using a convolutional neural network
Zhang, Yuan, et al.
Proteins: Structure, Function, and Bioinformatics 88.7 (2020) • code unavailable
Protein sequence design with a learned potential
Namrata Anand, Raphael Eguchi, Irimpan I. Mathews, Carla P. Perez, Alexander Derry, Russ B. Altman & Po-Ssu Huang
Nacture Communications (2022) • code
Protein Sequence Design with Deep Learning and Tooling like Monte Carlo Sampling and Analysis
Leonardo Castorina
paper not available • code
Learning from protein structure with geometric vector perceptrons
Jing, Bowen, et al.
arXiv preprint arXiv:2009.01411 (2020)/ICLR(2021) • GVP
Fast and flexible protein design using deep graph neural networks
Alexey Strokach, David Becerra, Carles Corbi-Verge, Albert Perez-Riba, Philip M. Kim
Cell Systems (2020) • code::ProteinSolver
Mimetic Neural Networks: A unified framework for Protein Design and Folding
Moshe Eliasof, Tue Boesen, Eldad Haber, Chen Keasar, Eran Treister
arXiv:2102.03881/Front. Bioinform. 2:715006
TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs
Li, Alex J., et al.
NeurIPS 2021 / arXiv (2022)
A neural network model for prediction of amino-acid probability from a protein backbone structure
Koya Sakuma, Naoya Kobayashi
Unpublished yet (June 2021)• GCNdesgin
XENet: Using a new graph convolution to accelerate the timeline for protein design on quantum computers
Maguire, Jack B., et al.
PLoS computational biology 17.9 (2021)
AlphaDesign: A graph protein design method and benchmark on AlphaFoldDB
Gao, Zhangyang, Cheng Tan, and Stan Li.
arXiv preprint arXiv:2202.01079 (2022) • code
Generative De Novo Protein Design with Global Context
Cheng Tan, Zhangyao Gao, Jun Xia and Stan Z. Li
arXiv • Apr 2022 • code
Masked inverse folding with sequence transfer for protein representation learning
Kevin K Yang, Hugh Yeh, Niccolò Zanichelli
bioRxiv 2022.05.25.493516 • code • model
Robust deep learning based protein sequence design using ProteinMPNN
Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Alexis Courbet, Robbert J. de Haas, Neville Bethel, Philip J. Y. Leung, Timothy F. Huddy, Sam Pellock, Doug Tischer, Frederick Chan, Brian Koepnick, Hannah Nguyen, Alex Kang, Banumathi Sankaran, Asim Bera, Neil P. King, David Baker
bioRxiv 2022.06.03.494563/Science (2022) • code • hugging face • lecture • colab(in_jax) • ProteinMPNN+ESMFold
Antibody-Antigen Docking and Design via Hierarchical Equivariant Refinement
Jin, Wengong, Regina Barzilay, and Tommi Jaakkola.
arXiv preprint arXiv:2207.06616 (2022)/International Conference on Machine Learning. PMLR, 2022 • code • poster
Neural Network-Derived Potts Models for Structure-Based Protein Design using Backbone Atomic Coordinates and Tertiary Motifs
Alex J. Li, Mindren Lu, Israel Desta, Vikram Sundar, Gevorg Grigoryan, and Amy E. Keating
bioRxiv 2022.08.02.501736/Protein Science, 32(2)
Conditional Antibody Design as 3D Equivariant Graph Translation
Xiangzhe Kong, Wenbing Huang, Yang Liu
arXiv:2208.06073
SE(3) Equivalent Graph Attention Network as an Energy-Based Model for Protein Side Chain Conformation
Deqin Liu, Sheng Chen, Shuangjia Zheng, Sen Zhang, Yuedong Yang
bioRxiv 2022.09.05.506704 • code
PiFold: Toward effective and efficient protein inverse folding
Zhangyang Gao, Cheng Tan, Stan Z. Li
arXiv:2209.12643v2/ICLR 2023 • github
Protein Sequence Design by Entropy-based Iterative Refinement
Xinyi Zhou, Guangyong Chen, Junjie Ye, Ercheng Wang, Jun Zhang, Cong Mao, Zhanwei Li, Jianye Hao, Xingxu Huang, Jin Tang, Pheng Ann Heng
bioRxiv 2023.02.04.527099
Lightweight Contrastive Protein Structure-Sequence Transformation
Jiangbin Zheng, Ge Wang, Yufei Huang, Bozhen Hu, Siyuan Li, Cheng Tan, Xinwen Fan, Stan Z. Li
arXiv:2303.11783
Modeling Protein Structure Using Geometric Vector Field Networks
Weian Mao, Muzhi Zhu, Hao Chen, Chunhua Shen
bioRxiv 2023.05.07.539736
Knowledge-Design: Pushing the Limit of Protein Deign via Knowledge Refinement
Zhangyang Gao, Cheng Tan, Stan Z. Li
arXiv:2305.15151/ICLR under review • code
SPIN-CGNN: Improved fixed backbone protein design with contact map-based graph construction and contact graph neural network
Xing Zhang, Hongmei Yin, Fei Ling, Jian Zhan, Yaoqi Zhou
bioRxiv 2023.07.07.548080 • code
ZetaDesign: an end-to-end deep learning method for protein sequence design and side-chain packing
Junyu Yan and others
Briefings in Bioinformatics, 2023 • code
Contextual protein encodings from equivariant graph transformers
Sai Pooja Mahajan, Jeffrey A. Ruffolo, Jeffrey J. Gray
bioRxiv 2023.07.15.549154 • code
Robust Design of Effective Allosteric Activators for Rsp5 E3 Ligase Using the Machine Learning Tool ProteinMPNN
Kao, Hsi-Wen, et al.
ACS Synthetic Biology (2023) • Supplymentary
Rationally seeded computational protein design
Katherine I. Albanese, Rokas Petrenas, Fabio Pirro, Elise A. Naudin, Ufuk Borucu, William M. Dawson, D. Arne Scott, Graham J. Leggett, Orion D. Weiner, Thomas A. A. Oliver, Derek N. Woolfson
bioRxiv 2023.08.25.554789 • code
Computational design of sequence-specific DNA-binding proteins
Cameron J Glasscock, Robert Pecoraro, Ryan McHugh, Lindsey A. Doyle, Wei Chen, Olivier Boivin, Beau Lonnquist, Emily Na, Yuliya Politanska, Hugh K Haddox, David Cox, Christoffer Norn, Brian Coventry, Inna Goreshnik, Dionne Vafeados, Gyu Rie Lee, Raluca Gordan, Barry L Stoddard, Frank DiMaio, David Baker
bioRxiv 2023.09.20.558720 • Supplymentary
Improving protein expression, stability, and function with ProteinMPNN
Kiera H. Sumida, Reyes Núñez Franco, Indrek Kalvet, Samuel J. Pellock, Basile I. M. Wicky, Lukas F. Milles, Justas Dauparas, Jue Wang, Yakov Kipnis, Noel Jameson, Alex Kang, Joshmyn De La Cruz, Banumathi Sankaran, Asim K Bera, Gonzalo Jimenez Oses, David Baker
bioRxiv 2023.10.03.560713 • Supplymentary
A Suite of Designed Protein Cages Using Machine Learning Algorithms and Protein Fragment-Based Protocols
Kyle Meador, Roger Castells-Graells, Roman Aguirre, Michael R. Sawaya, Mark A. Arbing, Trent Sherman, Chethaka Senarathne, Todd O. Yeates
bioRxiv 2023.10.09.561468 • code • colab
PROTEIN DESIGNER BASED ON SEQUENCE PROFILE USING ULTRAFAST SHAPE RECOGNITION
Anonymous
ICLR 2024 under review
Inverse folding for antibody sequence design using deep learning
Frédéric A. Dreyer, Daniel Cutting, Constantin Schneider, Henry Kenlay, Charlotte M. Deane
arXiv:2310.19513
De novo protein design for novel folds using guided conditional Wasserstein generative adversarial networks
Mostafa Karimi, Shaowen Zhu, Yue Cao, Yang Shen
Journal of chemical information and modeling 60.12 (2020) • gcWGAN
HelixGAN: A bidirectional Generative Adversarial Network with search in latent space for generation under constraints
Xuezhi Xie, Philip M. Kim
Machine Learning for Structural Biology Workshop, NeurIPS 2021/Bioinformatics, 2023;, btad036 • code
Generative models for graph-based protein design
John Ingraham, Vikas K Garg, Dr.Regina Barzilay, Tommi Jaakkola
NeurIPS 2019 • GraphTrans
Fold2Seq: A Joint Sequence (1D)-Fold (3D) Embedding-based Generative Model for Protein Design
Cao, Yue, et al.
International Conference on Machine Learning. PMLR, 2021
Rotamer-Free Protein Sequence Design Based on Deep Learning and Self-Consistency
Liu, Yufeng, et al.
Nature portfolio (2022)/Nature computational science(2022) • Supplementary • Comment • code
A Deep SE(3)-Equivariant Model for Learning Inverse Protein Folding
Mmatthew McPartlon, Ben Lai, Jinbo Xu
bioRxiv (2022)
Learning inverse folding from millions of predicted structures
Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, Alexander Rives
bioRxiv (2022) • esm
Breaking boundaries in protein design with a new AI model that understands interactions with any kind of molecule
LucianoSphere
Towards Data Science
Accurate and efficient protein sequence design through learning concise local environment of residues
Huang, Bin, et al.
bioRxiv (2022)/Bioinformatics 39.3 (2023) • Supplementary • website • code
PeTriBERT : Augmenting BERT with tridimensional encoding for inverse protein folding and design
Baldwin Dumortier, Antoine Liutkus, Clément Carré, Gabriel Krouk
bioRxiv 2022.08.10.503344
Evolutionary-scale prediction of atomic level protein structure with a language model
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, Alexander Rives
bioRxiv 2022.07.20.500902 • blog • github
Structure-informed Language Models Are Protein Designers
Zaixiang Zheng, Yifan Deng, Dongyu Xue, Yi Zhou, Fei YE, Quanquan Gu
arXiv:2302.01649 • code::ByProt
Incorporating Pre-training Paradigm for Antibody Sequence-Structure Co-design
Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Tianbo Peng, Yingce Xia, Liang He, Shufang Xie, Tao Qin, Haiguang Liu, Kun He, Tie-Yan Liu
arXiv:2211.08406 • code
A Text-guided Protein Design Framework
Shengchao Liu, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Anthony Gitter, Chaowei Xiao, Jian Tang, Hongyu Guo, Anima Anandkumar
arXiv:2302.04611 • code
An end-to-end deep learning method for protein side-chain packing and inverse folding
McPartlon, Matthew, and Jinbo Xu
Proceedings of the National Academy of Sciences 120.23 (2023) • code • Supplementary
Context-aware geometric deep learning for protein sequence design
Lucien Krapp, Fernado Meireles, Luciano Abriata, Matteo Dal Peraro
bioRxiv 2023.06.19.545381 • code
De Novo Generation and Prioritization of Target-Binding Peptide Motifs from Sequence Alone
Suhaas Bhat, Kalyan Palepu, Vivian Yudistyra, Lauren Hong, Venkata Srikar Kavirayuni, Tianlai Chen, Lin Zhao, Tian Wang, Sophia Vincoff, Pranam Chatterjee
bioRxiv 2023.06.26.546591 • code • colab • Supplementary
ProstT5: Bilingual Language Model for Protein Sequence and Structure Michael Heinzinger
Konstantin Weissenow, Joaquin Gomez Sanchez, Adrian Henkel, Martin Steinegger, Burkhard Rost
bioRxiv 2023.07.23.550085 • Supplementary • code
De novo Protein Sequence Design Based on Deep Learning and Validation on CalB Hydrolase
Junxi Mu, ZhenXin Li, Bo Zhang, Qi Zhang, Jamshed Iqbal, Abdul Wadood, Ting Wei, Yan Feng, Haifeng Chen
bioRxiv 2023.08.01.551444
Invariant point message passing for protein side chain packing and design
Nicholas Z Randolph, Brian Kuhlman
bioRxiv 2023.08.03.551328 • code
Atom-by-atom protein generation and beyond with language models
Daniel Flam-Shepherd, Kevin Zhu, Alán Aspuru-Guzik
arXiv:2308.09482
AntiFold: Improved antibody structure design using inverse folding
Magnus Høie, Alissa Hummer, Tobias Olsen, Morten Nielsen, Charlotte Deane
GenBio@NeurIPS2023 Spotlight • code • colab
DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with DenseNet
Qi, Yifei, and John ZH Zhang.
Journal of chemical information and modeling 60.3 (2020) • code unavailable
De novo protein backbone generation based on diffusion with structured priors and adversarial training
Yufeng Liu, Linghui Chen, Haiyan Liu
bioRxiv 2022.12.17.520847
Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model
Bo Ni, David L. Kaplan, Markus J. Buehler
Chem,(2023) • code • news
Graph Denoising Diffusion for Inverse Protein Folding
Kai Yi, Bingxin Zhou, Yiqing Shen, Pietro Liò, Yu Guang Wang
arXiv:2306.16819
Conditional Protein Denoising Diffusion Generates Programmable Endonucleases
Bingxin Zhou, Lirong Zheng, Banghao Wu, Kai Yi, Bozitao Zhong, Pietro Lio, Liang Hong
bioRxiv 2023.08.10.552783
Inverse Protein Folding Using Deep Bayesian Optimization
Natalie Maus, Yimeng Zeng, Daniel Allen Anderson, Phillip Maffettone, Aaron Solomon, Peyton Greenside, Osbert Bastani, Jacob R. Gardner
arXiv:2305.18089 • code
Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design
Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola
arXiv:2310.05764 • code
These models generate sequences from expected function.
Antibody complementarity determining region design using high-capacity machine learning
Liu, Ge, et al.
Bioinformatics 36.7 (2020): 2126-2133 • code
Protein design and variant prediction using autoregressive generative models
Shin, Jung-Eun, et al.
Nature communications 12.1 (2021) • code::SeqDesign • mutation effect prediction • sequence generation • April 2021
Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning
Mason, Derek M., et al.
Nature Biomedical Engineering 5.6 (2021) • code
Variational auto-encoding of protein sequences
Sinai, Sam, et al.
arXiv preprint arXiv:1712.03346 (2017)
Design by adaptive sampling
Brookes, David H., and Jennifer Listgarten.
arXiv preprint arXiv:1810.03714 (2018)
Pepcvae: Semi-supervised targeted design of antimicrobial peptide sequences
Das, Payel, et al.
arXiv preprint arXiv:1810.07743 (2018)
Deep generative models for T cell receptor protein sequences
Davidsen, Kristian, et al.
Elife 8 (2019)
How to hallucinate functional proteins
Costello, Zak, and Hector Garcia Martin.
arXiv preprint arXiv:1903.00458 (2019)
Convergent selection in antibody repertoires is revealed by deep learning
Friedensohn, Simon, et al.
BioRxiv (2020) • Supplementary • code available after publication
Variational autoencoder for generation of antimicrobial peptides
Dean, Scott N., and Scott A. Walper.
ACS omega 5.33 (2020)
Generating functional protein variants with variational autoencoders
Hawkins-Hooker, Alex, et al.
PLoS computational biology 17.2 (2021)
Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations
Das, Payel, et al.
Nature Biomedical Engineering 5.6 (2021)
Deep generative models create new and diverse protein structures
Zeming, Tom, Yann and Alexander.
NeurIPS 2021
PepVAE: variational autoencoder framework for antimicrobial peptide generation and activity prediction
Dean, Scott N., et al.
Frontiers in microbiology 12 (2021) • code • Supplementary
HydrAMP: a deep generative model for antimicrobial peptide discovery
Szymczak, Paulina, et al.
bioRxiv (2022) • code
Therapeutic enzyme engineering using a generative neural network
Giessel, Andrew, et al.
Scientific Reports 12.1 (2022)
GM-Pep: A High Efficiency Strategy to De Novo Design Functional Peptide Sequences
Chen, Qushuo, et al.
Journal of Chemical Information and Modeling (2022) • code
Mean Dimension of Generative Models for Protein Sequences
Christoph Feinauer, Emanuele Borgonovo
bioRxiv 2022.12.12.520028 • code
Prediction of designer-recombinases for DNA editing with generative deep learning
Schmitt, L.T., Paszkowski-Rogacz, M., Jug, F. et al.
Nat Commun 13, 7966 (2022) • code • Supplementary
ProT-VAE: Protein Transformer Variational AutoEncoder for Functional Protein Design
Emre Sevgen, Joshua Moller, Adrian Lange, John Parker, Sean Quigley, Jeff Mayer, Poonam Srivastava, Sitaram Gayatri, David Hosfield, Maria Korshunova, Micha Livne, Michelle Gill, Rama Ranganathan, Anthony B Costa, Andrew L Ferguson
bioRxiv 2023.01.23.525232
Deep-learning generative models enable design of synthetic orthologs of a signaling protein
Lian, Xinran, et al.
Biophysical Journal 122.3 (2023): 311a
Designing a protein with emergent function by combined in silico, in vitro and in vivo screening
Shunshi Kohyama, Bela Paul Frohn, Leon Babl, Petra Schwille
bioRxiv 2023.02.16.528840 • Supplementary
ProteinVAE: Variational AutoEncoder for Translational Protein Design
Suyue Lyu, Shahin Sowlati-Hashjin, Michael Garton
bioRxiv 2023.03.04.531110 • Supplementary
ProtWave-VAE: Integrating autoregressive sampling with latent-based inference for data-driven protein design
Niksa Praljak, Xinran Lian, Rama Ranganathan, Andrew Ferguson
bioRxiv 2023.04.23.537971 • Supplementary • code
Designing meaningful continuous representations of T cell receptor sequences with deep generative models
Allen Y. Leary, Darius Scott, Namita T. Gupta, Janelle C. Waite, Dimitris Skokos, Gurinder S. Atwal, Peter G. Hawkins
bioRxiv 2023.06.17.545423 • code
Utility of language model and physics-based approaches in modifying MHC Class-I immune-visibility for the design of vaccines and therapeutics
Hans-Christof Gasser, Diego Oyarzun, Ajitha Rajan, Javier Alfaro
bioRxiv 2023.07.10.548300
Feedback GAN for DNA optimizes protein functions
Gupta, Anvita, and James Zou.
Nature Machine Intelligence 1.2 (2019) • code
Generating protein sequences from antibiotic resistance genes data using Generative Adversarial Networks
Chhibbar, Prabal, and Arpit Joshi.
arXiv preprint arXiv:1904.13240 (2019)
ProGAN: Protein solubility generative adversarial nets for data augmentation in DNN framework
Han, Xi, et al.
Computers & Chemical Engineering 131 (2019)
GANDALF: Peptide Generation for Drug Design using Sequential and Structural Generative Adversarial Networks
Rossetto, Allison, and Wenjin Zhou.
Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 2020
Generating ampicillin-level antimicrobial peptides with activity-aware generative adversarial networks
Tucs, Andrejs, et al.
ACS omega 5.36 (2020)
Designing feature-controlled humanoid antibody discovery libraries using generative adversarial networks
Amimeur, Tileli, et al.
BioRxiv (2020)
Generating ampicillin-level antimicrobial peptides with activity-aware generative adversarial networks
Tucs, Andrejs, et al.
ACS omega 5.36 (2020) • code
Conditional Generative Modeling for De Novo Protein Design with Hierarchical Functions
Kucera, Tim, Matteo Togninalli, and Laetitia Meng-Papaxanthos
bioRxiv (2021)/Bioinformatics 38.13 (2022) • code
Expanding functional protein sequence spaces using generative adversarial networks
Repecka, Donatas, et al.
Nature Machine Intelligence 3.4 (2021) • code
A Generative Approach toward Precision Antimicrobial Peptide Design.
Ferrell, Jonathon B., et al.
BioRxiv (2021)
AMPGAN v2: Machine Learning-Guided Design of Antimicrobial Peptides
Van Oort, Colin M., et al.
Journal of chemical information and modeling 61.5 (2021)
DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity
Li, Guangyuan, et al.
Briefings in bioinformatics 22.6 (2021) • code • web
PandoraGAN: Generating antiviral peptides using Generative Adversarial Network
Surana, Shraddha, et al.
bioRxiv (2021)
Feedback-AVPGAN: Feedback-guided generative adversarial network for generating antiviral peptides
Hasegawa, Kano, et al.
Journal of Bioinformatics and Computational Biology (2022) • code
Designing antimicrobial peptides using deep learning and molecular dynamic simulations
Cao, Qiushi, et al.
Briefings in Bioinformatics (2023)
Including maked language models and autoregressive language models.
Progen: Language modeling for protein generation / Large language models generate functional protein sequences across diverse families
Madani, Ali, et al.
arXiv preprint arXiv:2004.03497 (2020)/Nat Biotechnol (2023) • ProGen, CTRL
Signal peptides generated by attention-based neural networks
Wu, Zachary, et al.
ACS Synthetic Biology 9.8 (2020)
Generative Language Modeling for Antibody Design
Shuai, Richard W., Jeffrey A. Ruffolo, and Jeffrey J. Gray.
bioRxiv (2021)/Cell Systems • Supplementary • code
Deep neural language modeling enables functional protein generation across families
Madani, Ali, et al.
bioRxiv (2021)
ProtTrans: towards cracking the language of Life's code through self-supervised deep learning and high performance computing
Elnaggar, Ahmed, et al.
arXiv preprint arXiv:2007.06225 (2020)
Protein sequence sampling and prediction from structural data
Gabriel A. Orellana, Javier Caceres-Delpiano, Roberto Ibañez, Michael P. Dunne, Leonardo Alvarez
bioRxiv 2021.09.06.459171
Transformer-based protein generation with regularized latent space optimization
Castro, E., Godavarthi, A., Rubinfien, J. et al.
Nat Mach Intell (2022)/arXiv:2201.09948 • code
BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning
Prihoda, David, et al.
mAbs. Vol. 14. No. 1. Taylor & Francis, 2022
Guided Generative Protein Design using Regularized Transformers
Castro, Egbert, et al.
arXiv preprint arXiv:2201.09948 (2022)
Towards Controllable Protein design with Conditional Transformers
Ferruz Noelia, and Birte Höcker.
arXiv preprint arXiv:2201.07338 (2022)/Nature Machine Intelligence (2022) • review of Heading 5.4
ProtGPT2 is a deep unsupervised language model for protein design
Noelia Ferruz, View ProfileSteffen Schmidt, View ProfileBirte Höcker
bioRxiv/Nature Communications • model::huggingface datasets::hugingface • lecture • research highlights • news
Few Shot Protein Generation
Ram, Soumya, and Tristan Bepler.
arXiv preprint arXiv:2204.01168 (2022)
RITA: a Study on Scaling Up Generative Protein Sequence Models
Hesslow, Daniel, et al.
arXiv preprint arXiv:2205.05789 (2022) • code
ProGen2: Exploring the Boundaries of Protein Language Models
Erik Nijkamp, Jeffrey Ruffolo, Eli N. Weinstein, Nikhil Naik, Ali Madani
arXiv:2206.13517 • code
AbLang: an antibody language model for completing antibody sequences
Tobias H Olsen, Iain H Moal, Charlotte M Deane
Bioinformatics Advances, Volume 2, Issue 1, 2022, vbac046
AbBERT: Learning Antibody Humanness via Masked Language Modeling
Denis Vashchenko, Sam Nguyen, Andre Goncalves, Felipe Leno da Silva, Brenden Petersen, Thomas Desautels, Daniel Faissol
bioRxiv 2022.08.02.502236
Accelerating Antibody Design with Active Learning
Seung-woo Seo, Min Woo Kwak, Eunji Kang, Chaeun Kim, Eunyoung Park, Tae Hyun Kang, Jinhan Kim
bioRxiv 2022.09.12.507690
Reprogramming Large Pretrained Language Models for Antibody Sequence Infilling
Igor Melnyk, Vijil Chenthamarakshan, Pin-Yu Chen, Payel Das, Amit Dhurandhar, Inkit Padhi, Devleena Das
ICLR 2023/arXiv:2210.07144
Machine Learning Optimization of Candidate Antibodies Yields Highly Diverse Sub-nanomolar Affinity Antibody Libraries
Lin Li, Esther Gupta, John Spaeth, Leslie Shing, Rafael Jaimes, Rajmonda Sulo Caceres, Tristan Bepler, Matthew E. Walsh
bioRxiv 2022.10.07.502662 • Supplementary • code will be available
ZymCTRL: a conditional language model for the contollable generation of artificial enzymes
Noelia Ferruz
NeurIPS 2022 • hugging face • poster
Unlocking de novo antibody design with generative artificial intelligence
Shanehsazzadeh, Amir, et al.
bioRxiv (2023): 2023-01 • data • news • blog • commercial
A universal deep-learning model for zinc finger design enables transcription factor reprogramming
Ichikawa, D.M., Abdin, O., Alerasool, N. et al.
Nat Biotechnol (2023)
XuperNovo®/ProteinGPT
XtalPi
news • news2 • website • commercial
Evaluating Prompt Tuning for Conditional Protein Sequence Generation
Andrea Nathansen, Kevin Klein, Bernhard Y. Renard, Melania Nowicka, Jakub M. Bartoszewicz
bioRxiv 2023.02.28.530492 • code
AB-Gen: Antibody Library Design with Generative Pre-trained Transformer and Deep Reinforcement Learning
Xiaopeng Xu, Tiantian Xu, Juexiao Zhou, Xingyu Liao, Ruochi Zhang, Yu Wang, Lu Zhang, Xin Gao
bioRxiv 2023.03.17.533102 • code • Supplementary • data
Unsupervised cross-domain translation via deep learning and adversarial attention neural networks and application to music-inspired protein designs
Buehler, Markus J.
Patterns 4.3 (2023) • code
ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models
Lee, Youhan, and Hasun Yu.
arXiv preprint arXiv:2303.16452 (2023)/ICLR 2023
REXzyme: A Translation Machine for the Generation of New-to-Nature Enzymes
Sebastian Lindner, Michael Heinzinger, Noelia Ferruz
paper coming soon • hugging face
Reprogramming Pretrained Language Models for Antibody Sequence Infilling
Igor Melnyk, Vijil Chenthamarakshan, Pin-Yu Chen, Payel Das, Amit Dhurandhar, Inkit Padhi, Devleena Das
arXiv:2210.07144 • code
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein
Bo Chen, Xingyi Cheng, Li-ao Gengyang, Shen Li, Xin Zeng, Boyan Wang, Gong Jing, Chiming Liu, Aohan Zeng, Yuxiao Dong, Jie Tang, Le Song
bioRxiv 2023.07.05.547496 • news • website • commercial
TULIP - a Transformer based Unsupervised Language model for Interacting Peptides and T-cell receptors that generalizes to unseen epitopes
Barthelemy Meynard-Piganeau, Christoph Feinauer, Martin Weigt, Aleksandra M Walczak, Thierry Mora
bioRxiv 2023.07.19.549669 • code
Efficient and accurate sequence generation with small-scale protein language models
Yaiza Serrano, Sergi Roda, Victor Guallar, Alexis Molina
bioRxiv 2023.08.04.551626
IMPROVING ANTIBODY AFFINITY USING LABORATORY DATA WITH LANGUAGE MODEL GUIDED DESIGN
Ben Krause, Subu Subramanian, Tom Yuan, Marisa Yang, Aaron Sato, Nikhil Naik
bioRxiv 2023.09.13.557505
De novo generation of antibody CDRH3 with a pre-trained generative large language model
HaoHuai He, Bing He, Lei Guan, Yu Zhao, Guanxing Chen, Qingge Zhu, Calvin Yu-Chian Chen, Ting Li, Jianhua Yao
bioRxiv 2023.10.17.562827 • code • data
NL2ProGPT: Taming Large Language Model for Conversational Protein Design
Anonymous
ICLR 2024 under review
SaLT&PepPr is an interface-predicting language model for designing peptide-guided protein degraders
Brixi, G., Ye, T., Hong, L. et al.
Commun Biol 6, 1081 (2023) • code
Accelerating protein design using autoregressive generative models
Riesselman, Adam, et al.
BioRxiv (2019)
Discovering de novo peptide substrates for enzymes using machine learning
Tallorin, Lorillee, et al.
Nature communications 9.1 (2018) • code
Biological Sequences Design using Batched Bayesian Optimization
Belanger, David, et al.
Machine Learning and the Physical Sciences Workshop (NeurIPS 2019)
Lattice protein design using Bayesian learning
Takahashi, Tomoei, George Chikenji, and Kei Tokita.
arXiv:2003.06601/Physical Review E 104.1 (2021): 014404
Now What Sequence? Pre-trained Ensembles for Bayesian Optimization of Protein Sequences
Ziyue Yang, Katarina A Milas, Andrew D White
bioRxiv 2022.08.05.502972 • code • Supplementary • Colab
AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation
Khan, Asif, et al.
arXiv preprint (2022)/Cell Reports Methods (2023): 100374
Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders
Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson
ICML 2022 • code
Statistical Mechanics of Protein Design
Takahashi, Tomoei, George Chikenji, and Kei Tokita.
arXiv preprint arXiv:2205.03696 (2022)
PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design
Ji Won Park, Samuel Stanton, Saeed Saremi, Andrew Watkins, Henri Dwyer, Vladimir Gligorijevic, Richard Bonneau, Stephen Ra, Kyunghyun Cho
arXiv:2210.04096
A probabilistic view of protein stability, conformational specificity, and design
Jacob A. Stern, Tyler J. Free, Kimberlee L. Stern, Spencer Gardiner, Nicholas A. Dalley, Bradley C. Bundy, Joshua L. Price, David Wingate, Dennis Della Corte
bioRxiv
A probabilistic view of protein stability, conformational specificity, and design
Jacob A. Stern, Tyler J. Free, Kimberlee L. Stern, Spencer Gardiner, Nicholas A. Dalley, Bradley C. Bundy, Joshua L. Price, David Wingate, Dennis Della Corte
bioRxiv 2022.12.28.521825 • Supplementary
Design of antimicrobial peptides containing non-proteinogenic amino acids using multi-objective Bayesian optimisation
Murakami Y, Ishida S, Demizu Y, Terayama K.
ChemRxiv. Cambridge: Cambridge Open Engage; 2023 • code
Vaxformer: Antigenicity-controlled Transformer for Vaccine Design Against SARS-CoV-2
Aryo Pradipta Gema, Michał Kobiela, Achille Fraisse, Ajitha Rajan, Diego A. Oyarzún, Javier Antonio Alfaro
arXiv:2305.11194 • code
Sample-efficient Antibody Design through Protein Language Model for Risk-aware Batch Bayesian Optimization
Yanzheng Wang, Boyue Wang, Tianyu Shi, Jie Fu, Yi Zhou, Zhizhuo Zhang
bioRxiv 2023.11.06.565922
Model-based reinforcement learning for biological sequence design
Angermueller, Christof, et al.
International conference on learning representations. 2019
Structured Q-learning For Antibody Design
Cowen-Rivers, Alexander I., et al.
arXiv preprint arXiv:2209.04698 (2022)
Protein Sequence Design in a Latent Space via Model-based Reinforcement Learning
Minji Lee, Luiz Felipe Vecchietti, Hyunkyu Jung, Hyunjoo Ro, Ho Min Kim, Meeyoung Cha
ICLR 2023/NeurIPS 2022 • Supplementary
Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization
Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon
arXiv:2209.06259/NeurIPS 2022 • poster
Self-play reinforcement learning guides protein engineering
Wang, Yi, et al.
Nature Machine Intelligence (2023) • code
Curiosity Driven Protein Sequence Generation via Reinforcement Learning
Anonymous
ICLR 2024 under review
Biological Sequence Design with GFlowNets
Jain, Moksh, et al.
arXiv preprint arXiv:2203.04115 (2022) • lecture
Deep learning to design nuclear-targeting abiotic miniproteins
Schissel, Carly K., et al.
Nature Chemistry 13.10 (2021) • code
Recurrent neural network model for constructive peptide design
Müller, Alex T., Jan A. Hiss, and Gisbert Schneider.
Journal of chemical information and modeling 58.2 (2018)
Machine learning designs non-hemolytic antimicrobial peptides
Capecchi, Alice, et al.
Chemical Science 12.26 (2021)
Using molecular dynamics simulations to prioritize and understand AI-generated cell penetrating peptides
Tran, Duy Phuoc, et al.
Scientific reports 11.1 (2021)
Computational antimicrobial peptide design and evaluation against multidrug-resistant clinical isolates of bacteria
Nagarajan, Deepesh, et al
Journal of Biological Chemistry 293.10 (2018)
Deep learning enables the design of functional de novo antimicrobial proteins
Caceres-Delpiano, Javier, et al.
bioRxiv (2020)
ECNet is an evolutionary context-integrated deep learning framework for protein engineering
Luo, Yunan, et al.
Nature communications 12.1 (2021)
Deep learning for novel antimicrobial peptide design
Wang, Christina, Sam Garlick, and Mire Zloh.
Biomolecules 11.3 (2021)
Antibody design using LSTM based deep generative model from phage display library for affinity maturation
Saka, Koichiro, et al.
Scientific reports 11.1 (2021)
Deep learning to design nuclear-targeting abiotic miniproteins
Schissel, Carly K., et al.
Nature Chemistry 13.10 (2021)
In silico proof of principle of machine learning-based antibody design at unconstrained scale
Akbar, Rahmad, et al.
Mabs. Vol. 14. No. 1. Taylor & Francis, 2022 • code
Large-scale design and refinement of stable proteins using sequence-only models
Singer, Jedediah M., et al.
PloS one 17.3 (2022) • code
Deep-learning based bioactive therapeutic peptides generation and screening
Haiping Zhang, Konda Mani Saravanan, Yanjie Wei, Yang Jiao, Yang Yang, Yi Pan, Xuli Wu, John Z.H. Zhang
bioRxiv 2022.11.14.516530 • code • Supplementary
Deep-learning based bioactive peptides generation and screening against Xanthine oxidase
Haiping Zhang, Konda Mani Saravanan, John Z.H. Zhang, Xuli Wu
bioRxiv 2023.01.11.523536
Deep Learning-Based Bioactive Therapeutic Peptide Generation and Screening
Zhang, Haiping, et al.
Journal of Chemical Information and Modeling 63.3 (2023) • code
Efficient generative modeling of protein sequences using simple autoregressive models
Trinquier, Jeanne, et al.
Nature communications 12.1 (2021): 1-11 • code
Conformal prediction for the design problem
Clara Fannjiang, Stephen Bates, Anastasios N. Angelopoulos, Jennifer Listgarten, Michael I. Jordan
arXiv:2202.03613v4 • code
How pairwise coevolutionary models capture the collective residue variability in proteins?
Figliuzzi, Matteo, Pierre Barrat-Charlaix, and Martin Weigt.
Molecular biology and evolution 35.4 (2018): 1018-1027 • code
A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences
Nataša Tagasovska, Nathan C. Frey, Andreas Loukas, Isidro Hötzel, Julien Lafrance-Vanasse, Ryan Lewis Kelly, Yan Wu, Arvind Rajpal, Richard Bonneau, Kyunghyun Cho, Stephen Ra, Vladimir Gligorijević
arXiv:2210.10838 • slides
Computational design of novel Cas9 PAM-interacting domains using evolution-based modelling and structural quality assessment
Cyril Malbranke, William Rostain, Florence Depardieu, Simona Cocco, Remi Monasson, David Bikard
bioRxiv 2023.03.20.533501 • code • Supplementary
Protein Discovery with Discrete Walk-Jump Sampling
Nathan C. Frey, Daniel Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance-Vanasse, Isidro Hotzel, Yan Wu, Stephen Ra, Richard Bonneau, Kyunghyun Cho, Andreas Loukas, Vladimir Gligorijevic, Saeed Saremi
arXiv:2306.12360/ICLR 2024 under review • code • lecture
denoising-diffusion-protein-sequence
Zhangzhi Peng
Paper unavailable • github
Protein Design with Guided Discrete Diffusion
Nate Gruver, Samuel Stanton, Nathan C. Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, Andrew Gordon Wilson
arXiv:2305.20009 • code
PRO-LDM: Protein Sequence Generation with Conditional Latent Diffusion Models
Zixuan Jiang, Sitao Zhang, Rundong Huang, Shaoxun Mo, Letao Zhu, Peiheng Li, Ziyi Zhang, Xi Chen, Yunfei Long, Renjing Xu, Rui Qing
bioRxiv 2023.08.22.554145 • Supplementary
Protein generation with evolutionary diffusion: sequence is all you need
Sarah Alamdari, Nitya Thakkar, Rianne van den Berg, Alex Xijie Lu, Nicolo Fusi, Ava Pardis Amini, Kevin K Yang
bioRxiv 2023.09.11.556673 • code • data
AntiBARTy Diffusion for Property Guided Antibody Design
Jordan Venderley
arXiv:2309.13129
Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins
Markus J. Buehler
arXiv:2305.04934 • code
Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences
Minsu Kim, Federico Berto, Sungsoo Ahn, Jinkyoo Park
arXiv:2306.03111 • code
These models generate protein structures(including side chains) from expected function or recover a part of protein structures(aka. inpainting)
One-sided design of protein-protein interaction motifs using deep learning
Syrlybaeva, Raulia, and Eva-Maria Strauch.
bioRxiv (2022) • code • our notes • lecture
Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models
Namrata Anand, Tudor Achim
GitHub (2022)/arXiv (2022) • our notes • lecture
Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures
Shitong Luo, Yufeng Su, Xingang Peng, Sheng Wang, Jian Peng, Jianzhu Ma
bioRxiv 2022.07.10.499510/ICML (2023) • code • hugging face
Illuminating protein space with a programmable generative model
John Ingraham, Max Baranov, Zak Costello, Vincent Frappier, Ahmed Ismail, Shan Tie, Wujie Wang, Vincent Xue, Fritz Obermeyer, Andrew Beam, Gevorg Grigoryan
Generate Biomedicines Preprint • plausible code • website • news • commercial
Physics-Inspired Protein Encoder Pre-Training via Siamese Sequence-Structure Diffusion Trajectory Prediction
Zuobai Zhang, Minghao Xu, Aurélie Lozano, Vijil Chenthamarakshan, Payel Das, Jian Tang
arXiv:2301.12068 • code
TRDiffusion
TIANRANG XLab
news • website • commercial
An all-atom protein generative model
Alexander E Chu, Lucy Cheng, Gina El Nesr, Minkai Xu, Po-Ssu Huang
bioRxiv 2023.05.24.542194 • code
DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing
Yangtian Zhan, Zuobai Zhang, Bozitao Zhong, Sanchit Misra, Jian Tang
arxiv 2023.06.01 • code
AbDiffuser: Full-Atom Generation of In-Vitro Functioning Antibodies
Karolis Martinkus, Jan Ludwiczak, Kyunghyun Cho, Wei-Ching Lian, Julien Lafrance-Vanasse, Isidro Hotzel, Arvind Rajpal, Yan Wu, Richard Bonneau, Vladimir Gligorijevic, Andreas Loukas
arXiv:2308.05027 • lecture
Generative Diffusion Models for Antibody Design, Docking, and Optimization
Zhangzhi Peng, Chenchen Han, Xiaohan Wang, Dapeng Li, Fajiie Yuan
bioRxiv 2023.09.25.559190 • code • website
Bridging Sequence and Structure: Latent Diffusion for Conditional Protein Generation
Anonymous
ICLR 2024 under review
Deep learning methods for designing proteins scaffolding functional sites
Wang J, Lisanza S, Juergens D, Tischer D, Anishchenko I, Baek M, Watson JL, Chun JH, Milles LF, Dauparas J, Expòsit M, Yang W, Saragovi A, Ovchinnikov S, Baker D
bioRxiv(2021)/Science(2022) • RFDesign • our notes • lecture • RoseTTAFold • Supplementary, Other Supplementary
Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models / De novo design of protein structure and function with RFdiffusion
Watson, J.L., Juergens, D., Bennett, N.R. et al.
Bakerlab Preprint/bioRxiv 2022.12.09.519842/Nature (2023) • news, news2, news3 • Supplementary • lecture • RFdiffusion:code, Colab • blog
De novo design of high-affinity protein binders to bioactive helical peptides
Susana Vázquez Torres, Philip J. Y. Leung, Isaac D. Lutz, Preetham Venkatesh, Joseph L Watson, Fabian Hink, Huu-Hien Huynh, Andy Hsien-Wei Yeh, David Juergens, Nathaniel R. Bennett, Andrew N. Hoofnagle, Eric Huang, Michael J. MacCoss, Marc Expòsit, Gyu Rie Lee, Elif Nihal Korkmaz, Jeff Nivala, Lance Stewart, Joseph M. Rodgers, David Baker
bioRxiv 2022.12.10.519862 • Supplementary
Joint Generation of Protein Sequence and Structure with RoseTTAFold Sequence Space Diffusion
Sidney Lyayuga Lisanza, Jacob Merle Gershon, Sam Wayne Kenmore Tipps, Lucas Arnoldt, Samuel Hendel, Jeremiah Nelson Sims, Xinting Li, David Baker
bioRxiv 2023.05.08.539766 • code • hugging face • lecture
The structural landscape of the immunoglobulin fold by large-scale de novo design
Jorge Roel-Touris, Lourdes Carcelen, Enrique Marcos
bioRxiv 2023.10.03.560637 • Supplementary • code • data
Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom
Rohith Krishna, Jue Wang, Woody Ahern, Pascal Sturmfels, Preetham Venkatesh, Indrek Kalvet, Gyu Rie Lee, Felix S Morey-Burrows, Ivan Anishchenko, Ian R Humphreys, Ryan McHugh, Dionne Vafeados, Xinting Li, George A Sutherland, Andrew Hitchcock, C Neil Hunter, Minkyung Baek, Frank DiMaio, David Baker
bioRxiv 2023.10.09.561603 • Supplementary
Amalga: Designable Protein Backbone Generation with Folding and Inverse Folding Guidance
Shugao Chen, Ziyao Li, Xiangxiang Zeng, Guolin Ke
bioRxiv 2023.11.07.565939
De Novo Design of Site-specific Protein Binders Using Surface Fingerprints
Wehrle, Sarah, et al.
Protein Science 30.CONF (2021)/bioRxiv (2022)/Nature (2023) • Supplementary • masif_seed • masif • lecture
Iterative refinement graph neural network for antibody sequence-structure co-design
Jin, Wengong, et al.
arXiv preprint arXiv:2110.04624 (2021) • RefineGNN • lecture1, lecture2
Antibody Complementarity Determining Regions (CDRs) design using Constrained Energy Model
Fu, Tianfan, and Jimeng Sun.
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022 • code
Conditional Antibody Design as 3D Equivariant Graph Translation
Xiangzhe Kong, Wenbing Huang, Yang Liu
ICLR 2023/arXiv:2208.06073
Incorporating Pre-training Paradigm for Antibody Sequence-Structure Co-design
Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Tianbo Peng, Yingce Xia, Liang He, Shufang Xie, Tao Qin, Haiguang Liu, Kun He, Tie-Yan Liu
arXiv:2211.08406
End-to-End Full-Atom Antibody Design
Xiangzhe Kong, Wenbing Huang, Yang Liu
arXiv:2302.00203 • code
AbODE: Ab Initio Antibody Design using Conjoined ODEs
Yogesh Verma, Markus Heinonen, Vikas Garg
arXiv:2306.01005
Joint Design of Protein Sequence and Structure based on Motifs
Zhenqiao Song, Yunlong Zhao, Yufei Song, Wenxian Shi, Yang Yang, Lei Li
arXiv:2310.02546
De novo protein design using geometric vector field networks
Weian Mao, Muzhi Zhu, Zheng Sun, Shuaike Shen, Lin Yuanbo Wu, Hao Chen, Chunhua Shen
arXiv:2310.11802/ICLR 2024 under review
Protein Sequence and Structure Co-Design with Equivariant Translation
Chence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, Jian Tang
arXiv:2210.08761/ICLR 2023 • Supplementary • code
Deep Learning for Flexible and Site-Specific Protein Docking and Design
Matt McPartlon, Jinbo Xu
bioRxiv 2023.04.01.535079 • Title
Full-Atom Protein Pocket Design via Iterative Refinement
Zaixi Zhang, Zepu Lu, Zhongkai Hao, Marinka Zitnik, Qi Liu
arXiv:2310.02553 • code
Functional Geometry Guided Protein Sequence and Backbone Structure Co-Design
Anonymous
ICLR 2024 under review
Protein Complex Invariant Embedding with Cross-Gate MLP is A One-Shot Antibody Designer
Cheng Tan, Zhangyang Gao, Stan Z. Li
arXiv:2305.09480
Deep generative models of genetic variation capture the effects of mutations
Adam J. Riesselman, John B. Ingraham & Debora S. Marks
Nature Methods • code::DeepSequence • Oct 2018
Deciphering protein evolution and fitness landscapes with latent space models
Xinqiang Ding, Zhengting Zou & Charles L. Brooks III
Nature Communications • code::PEVAE • Dec 2019
Is transfer learning necessary for protein landscape prediction?
Shanehsazzadeh, Amir, David Belanger, and David Dohan.
arXiv preprint arXiv:2011.03443 (2020)
Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions
Amirali Aghazadeh, Hunter Nisonoff, Orhan Ocal, David H. Brookes, Yijie Huang, O. Ozan Koyluoglu, Jennifer Listgarten & Kannan Ramchandran
Nature Communications • code • Sep 2021
The generative capacity of probabilistic protein sequence models
Francisco McGee, Sandro Hauri, Quentin Novinger, Slobodan Vucetic, Ronald M. Levy, Vincenzo Carnevale & Allan Haldane
Nature Communications • code::generation_capacity_metrics • code::sVAE • Nov 2021
Learning the local landscape of protein structures with convolutional neural networks
Kulikova, Anastasiya V., et al
Journal of Biological Physics 47.4 (2021)
Proximal Exploration for Model-guided Protein Sequence Design
Zhizhou Ren, Jiahan Li, Fan Ding, Yuan Zhou, Jianzhu Ma, Jian Peng
BioRxiv (2022) • code • commercial
Efficient evolution of human antibodies from general protein language models and sequence information alone
Hie, Brian L., et al.
bioRxiv (2022) • code
Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval
Notin, P., Dias, M., Frazer, J., Marchena-Hurtado, J., Gomez, A., Marks, D.S., Gal, Y.
ICML (2022)/arXiv:2205.13760 • code • hugging face
Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments
Ruyun Hu, Lihao Fu, Yongcan Chen, Junyu Chen, Yu Qiao, Tong Si
bioRxiv 2022.08.11.503535
Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness
Bachas, Sharrol, et al.
bioRxiv 2022.08.16.504181 • poster
Construction of a Deep Neural Network Energy Function for Protein Physics
Yang, Huan, Zhaoping Xiong, and Francesco Zonta
Journal of Chemical Theory and Computation (2022)
Inferring protein fitness landscapes from laboratory evolution experiments
Sameer D’Costa, Emily C. Hinds, Chase R. Freschlin, Hyebin Song, Philip A. Romero
bioRxiv 2022.09.01.506224 • Supplementary
BayeStab: Predicting Effects of Mutations on Protein Stability with Uncertainty Quantification
Wang, Shuyu, et al.
Protein Science (2022) • code • website
Tuned Fitness Landscapes for Benchmarking Model-Guided Protein Design
Neil Thomas, Atish Agarwala, David Belanger, Yun S. Song, Lucy Colwell
bioRxiv 2022.10.28.514293 • code
Protein design using structure-based residue preferences
David Ding, Ada Y Shaw, Sam Sinai, Nathan J Rollins, Noam Prywes, David Savage, Michael T Laub, Debora S Marks
bioRxiv 2022.10.31.514613 • code
Accurate Mutation Effect Prediction using RoseTTAFold
Sanaa Mansoor, Minkyung Baek, David Juergens, Joseph L Watson, David Baker
bioRxiv 2022.11.04.515218
Learning the shape of protein micro-environments with a holographic convolutional neural network
Pun, Michael N., et al.
bioRxiv (2022) • code
Infer global, predict local: quantity-quality trade-off in protein fitness predictions from sequence data
Lorenzo Posani, Francesca Rizzato, Rémi Monasson, Simona Cocco
bioRxiv 2022.12.12.520004
Validation of de novo designed water-soluble and transmembrane proteins by in silico folding and melting
Alvaro Martin, Carolin Berner, Sergey Ovchinnikov, Anastassia Andreevna Vorobieva
bioRxiv 2023.06.06.543955 • colab
PoET: A generative model of protein families as sequences-of-sequences
Timothy F. Truong Jr, Tristan Bepler
arXiv:2306.06156 • code
Rapid protein stability prediction using deep learning representations
Lasse M BlaabjergMaher M KassemLydia L GoodNicolas JonssonMatteo CagiadaKristoffer E JohanssonWouter BoomsmaAmelie SteinKresten Lindorff-Larsen
eLife 12:e82593 • code
A general Temperature-Guided Language model to engineer enhanced Stability and Activity in Proteins
Pan Tan, Mingchen Li, Yuanxi Yu, Fan Jiang, Lirong Zheng, Banghao Wu, Xinyu Sun, Liqi Kang, Jie Song, Liang Zhang, Yi Xiong, Wanli Ouyang, Zhiqiang Hu, Guisheng Fan, Yufeng Pei, Liang Hong
arXiv:2307.12682
Transfer learning to leverage larger datasets for improved prediction of protein stability changes
Henry Dieckhaus, Michael Brocidiacono, Nicholas Randolph, Brian Kuhlman
bioRxiv 2023.07.27.550881 • code • Supplymentary
Boosting AND/OR-Based Computational Protein Design: Dynamic Heuristics and Generalizable UFO
Bobak Pezeshki, Radu Marinescu, Alexander Ihler, Rina Dechter
arXiv:2309.00408
Zero-shot Mutation Effect Prediction on Protein Stability and Function using RoseTTAFold
Sanaa Mansoor, Minkyung Baek, David Juergens, Joseph L. Watson, David Baker
Protein Science • dissertation
Accurate proteome-wide missense variant effect prediction with AlphaMissense
Jun Cheng et al.
Science0,eadg7492DOI:10.1126/science.adg7492 • code • data
What makes the effect of protein mutations difficult to predict?
Floris Julian van der Flier, Dave Estell, Sina Pricelius, Lydia Dankmeyer, Sander van Stigt Thans, Harm Mulder, Rei Otsuka, Frits Goedegebuur, Laurens Lammerts, Diego Staphorst, Aalt D.J. van Dijk, Dick de Ridder, Henning Redestig
bioRxiv 2023.09.25.559319 • code
Structure-based self-supervised learning enables ultrafast prediction of stability changes upon mutation at the protein universe scale
Jinyuan Sun, Tong Zhu, Yinglu Cui, Bian Wu
bioRxiv 2023.08.09.552725 • code
More detailed protein representation learning list:
Lirong Wu's awesome-protein-representation-learning
Unified rational protein engineering with sequence-based deep representation learning
Alley, Ethan C., et al.
Nature methods 16.12 (2019)
Protein Structure Representation Learning by Geometric Pretraining
Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, Jian Tang
arXiv • Jan 2022
Evolutionary velocity with protein language models
Brian L. Hie, Kevin K. Yang, and Peter S. Kim
bioRxiv
Advancing protein language models with linguistics: a roadmap for improved interpretability
Mai Ha Vu, Rahmad Akbar, Philippe A. Robert, Bartlomiej Swiatczak, Victor Greiff, Geir Kjetil Sandve, Dag Trygve Truslew Haug
arXiv:2207.00982
Deciphering the language of antibodies using self-supervised learning
Leem, Jinwoo, et al.
Patterns (2022): 100513 • code
On Pre-training Language Model for Antibody
Anonymous(Paper under double-blind review)
ICLR 2023 • Supplementary
Antibody Representation Learning for Drug Discovery
Lin Li, Esther Gupta, John Spaeth, Leslie Shing, Tristan Bepler, Rajmonda Sulo Caceres
arXiv:2210.02881
Unlike function-scaffold-sequence paradigm in protein design, major molecular design models based on paradigm form DL from 3 kinds of level: atom-based, fragment-based, reaction-based, and they can be categorized as Gradient optimization or Optimized sampling(gradient-free). Click here for detail review
In consideration of learning more various of generative models for design, these recommended latest models from Molecular Design might be helpful and even be able to be transplanted to protein design. More paper list at :
Differentiable scaffolding tree for molecular optimization
Fu, T., Gao, W., Xiao, C., Yasonik, J., Coley, C. W., & Sun, J.
arXiv preprint arXiv:2109.10469 • code • Sept 21
Equivariant Energy-Guided SDE for Inverse Molecular Design
Fan Bao, Min Zhao, Zhongkai Hao, Peiyao Li, Chongxuan Li, Jun Zhu
arXiv:2209.15408
Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design
Keir Adams, Connor W. Coley
arXiv:2210.04893 • code
Structure-based Drug Design with Equivariant Diffusion Models
Arne Schneuing, Yuanqi Du, Charles Harris, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, Michael Bronstein, Bruno Correia
NeurIPS 2022/arXiv:2210.13695 • code
Generating 3D Molecules for Target Protein Binding
Meng Liu, Youzhi Luo, Kanji Uchino, Koji Maruhashi, Shuiwang Ji
International Conference on Machine Learning 39 (2022) • GraphBP
Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
Peng, Xingang, et al.
International Conference on Machine Learning 39 (2022) • code
Reinforced Genetic Algorithm for Structure-based Drug Design
Fu, Tianfan, et al.
arXiv preprint arXiv:2211.16508 (2022)/ICML22 • code • website
Molecule Generation For Target Protein Binding with Structural Motifs
Zhang, Zaixi, et al.
International Conference on Learning Representations 11 (2023) • code
3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction
Guan, Jiaqi, et al.
International Conference on Learning Representations 11 (2023) • code