A curated list of awesome papers, data sets, frameworks, packages, blogs, and other resources related to machine learning for small-molecule drug discovery. Please contribute!
- Walters and Barzilay, 2021. Critical assessment of AI in drug discovery.
- White, 2021. Deep Learning for Molecules and Materials.
- Coley, 2020. Defining and Exploring Chemical Spaces.
- Chuang et al, 2020. Learning Molecular Representations for Medicinal Chemistry.
- Walters and Barzilay, 2020. Applications of Deep Learning in Molecule Generation and Molecular Property Prediction.
- Cai et al, 2020. Transfer Learning for Drug Discovery.
- Wang et al, 2022. Molecular Contrastive Learning of Representations via Graph Neural Networks. [Code]
- Ahmad et al, 2021. ChemBERTa-2: Towards Chemical Foundation Models. [Code]
- Satorras et al, 2021. E(n) Equivariant Graph Neural Networks. [Code]
- Stanley et al, 2021. FS-Mol: A Few-Shot Learning Dataset of Molecules. [Code]
- Townshend et al, 2021. ATOM3D: Tasks On Molecules in Three Dimensions.
- Xue et al, 2021. X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis. [Code]
- Chuang and Keiser, 2020. Attention-Based Learning on Molecular Ensembles.
- Li and Fourches, 2020. Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT. [Code]
- Maziarka et al, 2020. Molecule Attention Transformer. [Code]
- Kren et al, 2019. Self-Referencing Embedded Strings (SELFIES)
- Hu et al, 2019. Strategies for Pre-training Graph Neural Networks. [Code]
- Yang et al, 2019. Analyzing Learned Molecular Representations for Property Prediction (Chemprop). [Code]
- Feinberg et al, 2018. PotentialNet for Molecular Property Prediction.
- Altae-Tran et al, 2017. Low Data Drug Discovery with One-Shot Learning.
- Bengio et al, 2021. Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation. [Code]
- Berenger and Tsuda, 2021. Molecular generation by Fast Assembly of (Deep)SMILES fragments. [Code]
- Gao et al, 2021. Amortized Tree Generation for Bottom-up Synthesis Planning and Synthesizable Molecular Design. [Code]
- Takeuchi et al, 2021. R-group replacement database for medicinal chemistry.
- Imrie et al, 2020. Deep Generative Models for 3D Linker Design. [Code]
- Jin et al, 2020. Hierarchical Generation of Molecular Graphs using Structural Motifs. [Code]
- Polishchuk, 2020. CReM: chemically reasonable mutations framework for structure generation. [Code]
- Brown, 2019. GuacaMol: Benchmarking Models for de Novo Molecular Design. [Code]
- Popova et al, 2019. MolecularRNN: Generating realistic molecular graphs with optimized properties .
- You et al, 2019. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. [Code]
- Zhou et al, 2019. Optimization of Molecules via Deep Reinforcement Learning. [Code (official version)] [PyTorch implementation]
- Jin et al, 2018. Junction Tree Variational Autoencoder for Molecular Graph Generation. [Code]
- Merk et al, 2018. De Novo Design of Bioactive Small Molecules by Artificial Intelligence.
- Stärk et al, 2022. EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction. [Code]
- Bender et al, 2021. A practical guide to large-scale docking.
- García-Ortegón et al, 2021. DOCKSTRING: easy molecular docking yields better benchmarks for ligand design. [Code] [Data]
- Graff et al, 2021. Accelerating high-throughput virtual screening through molecular pool-based active learning. [Code]
- Gentile et al, 2020. Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery. [Code]
- Cáceres et al, 2020. Adding Stochastic Negative Examples into Machine Learning Improves Molecular Bioactivity Prediction.
- Lin et al, 2019. Ultra-large library docking for discovering new chemotypes.
- Karim et al, 2021. CardioTox net: a robust predictor for hERG channel blockade based on deep learning meta-feature ensembles. [Code]
- Siramshetty et al, 2021. Validating ADME QSAR Models Using Marketed Drugs.
- Göller et al, 2020. Bayer’s in silico ADMET platform: a journey of machine learning over the past two decades.
- Ryu et al, 2020. DeepHIT: a deep learning framework for prediction of hERG-induced cardiotoxicity. [Code]
- Cai et al, 2019. Deep Learning-Based Prediction of Drug-Induced Cardiotoxicity. [Code]
- Ogura et al, 2019. Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II. [Data]
- Lombardo et al, 2018. In Silico Absorption, Distribution, Metabolism, Excretion, and Pharmacokinetics (ADME-PK): Utility and Best Practices.
- Fortunato et al, 2020. Data augmentation and pretraining for template-based retrosynthetic prediction in computer-aided synthesis planning.
- Koch et al, 2020. Reinforcement Learning for Bioretrosynthesis
- Somnath et al, 2020. Learning Graph Models for Retrosynthesis Prediction.
- Dai et al, 2019. Retrosynthesis Prediction with Conditional Graph Logic Network. [Code]
- Coley et al, 2018. SCScore: Synthetic Complexity Learned from a Reaction Corpus. [Code] [DeepChem implementation]
- Humer et al, 2021. ChemInformatics Model Explorer (CIME): Exploratory analysis of chemical model explanations. [Code]
- Matveieva and Polishchuk, 2021. Benchmarks for interpretation of QSAR models. [Code]
- Atsushi et al, 2019. Integrating the Structure–Activity Relationship Matrix Method with Molecular Grid Maps and Activity Landscape Models for Medicinal Chemistry Applications.
- Naveja and Medina-Franco, 2019. Finding Constellations in Chemical Space Through Core Analysis.
- ADME@NCATS
- AMED Cardiotoxicity Database
- BindingDB
- ChEMBL
- DrugBank
- DrugMatrix
- Enamine Real database
- hERG Central
- MoleculeNet
- MONA: DB of Mass spec + other readouts
- NPASS database of natural products
- PubChem
- The Open Reaction Database
- Therapeutic Data Commons
- Zinc
- AutoDock Vina
- BioPandas
- DeepChem [Tutorials]
- Open Babel
- pdb-tools
- PyTorch Geometric
- rd_filters
- Small-World Search
- TorchDrug