Dragonfly Chemist
Authors: Ksenia Korovina (kkorovin@cs.cmu.edu), Celsius Xu
Dragonfly Chemist is library for joint molecular optimization and synthesis. It is based on Dragonfly - a framework for scalable Bayesian optimization.
Structure of the repo
experiments
package contains experiment scripts. In particular,run_chemist.py
script illustrates usage of the classes.chemist_opt
package isolates the Chemist class which performs joint optimization and synthesis. Contains harnesses for calling molecular functions (MolFunctionCaller
) and handling optimization over molecular domains (MolDomain
). Calls formols
andexplore
.explorer
implements the exploration of molecular domain. Currently, aRandomExplorer
is implemented, which explores reactions randoml, starting from a given pool. Calls forsynth
.mols
contains theMolecule
class, theReaction
class, a few examples of objective function definitions, as well as implementations of molecular versions of all components needed for BO to work:MolCPGP
andMolCPGPFitter
class and molecular kernels.synth
is responsible for performing forward synthesis.rdkit_contrib
is an extension to rdkit that provides computation of a few molecular scores (for older versions ofrdkit
).baselines
contains wrappers for models we compare against.
Getting started
It's recommended to use python3.
Python packages
First, set up environment for RDKit and Dragonfly:
conda create -c rdkit -n chemist-env rdkit python=3.6
# optionally: export PATH="/opt/miniconda3/bin:$PATH"
conda activate chemist-env # or source activate chemist-env with older conda
Install basic requirements with pip:
pip install -r requirements.txt
Kernel-related packages
Certain functionality (some of the graph-based kernels) require the graphkernels
package, which can be installed additionally. First, you need to install eigen3
, pkg-config
: see instructions here:
sudo apt-get install libeigen3-dev; sudo apt-get install pkg-config # on Linux
brew install eigen; brew install pkg-config # on MacOS
pip install graphkernels
If the above fails on MacOS (see stackoverflow), the simplest solution is
MACOSX_DEPLOYMENT_TARGET=10.9 pip install graphkernels
To use distance-based kernels, you need Cython and OT distance computers:
pip install Cython
pip install cython POT # prepended with MACOSX_DEPLOYMENT_TARGET=10.9 if needed
Synthesis Path Plotting Functionality
For plotting the synthesis path for an optimal molecule, install graphviz
via:
pip install graphviz
However, the above only works on Linux as Homebrew removed the --with-pango
option (see this)
Environment
Set PYTHONPATH for imports:
source setup.sh
Getting data
ChEMBL data as txt can be found in kevinid's repo, official downloads. ZINC database can be downloaded from the official site. Run the following to automatically download the datasets and put them into the right directory:
bash download_data.sh
Running tests
TODO
Running experiments
See experiments/run_chemist.py
for the Chemist usage example.