The code for the analyses in our blog post Get with the Program: Building ADME benchmark datasets that drive impact. This post:
- Shows how in ML for small molecule drug discovery, one reason that research advances don't always translate to practical impact is because existing benchmark datasets don't capture realistic components of drug programs.
- Provides a path forward for constructing better benchmarks from existing public data by carefully selecting data sets, setting constraints on allowable training data, and using appropriate evaluation metrics.
To run the code, install dependencies from requirements.txt
into a virtual environment:
pip install --upgrade pip && pip install -r requirements.txt