kirthevasank / dragonfly_chemist

DOE framework for joint molecular optimization and synthesis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dragonfly Chemist

Authors: Ksenia Korovina (kkorovin@cs.cmu.edu), Celsius Xu

Dragonfly Chemist is library for joint molecular optimization and synthesis. It is based on Dragonfly - a framework for scalable Bayesian optimization.

Structure of the repo

  • experiments package contains experiment scripts. In particular, run_chemist.py script illustrates usage of the classes.
  • chemist_opt package isolates the Chemist class which performs joint optimization and synthesis. Contains harnesses for calling molecular functions (MolFunctionCaller) and handling optimization over molecular domains (MolDomain). Calls for mols and explore.
  • explorer implements the exploration of molecular domain. Currently, a RandomExplorer is implemented, which explores reactions randoml, starting from a given pool. Calls for synth.
  • mols contains the Molecule class, the Reaction class, a few examples of objective function definitions, as well as implementations of molecular versions of all components needed for BO to work: MolCPGP and MolCPGPFitter class and molecular kernels.
  • synth is responsible for performing forward synthesis.
  • rdkit_contrib is an extension to rdkit that provides computation of a few molecular scores (for older versions of rdkit).
  • baselines contains wrappers for models we compare against.

Getting started

It's recommended to use python3.

Python packages

First, set up environment for RDKit and Dragonfly:

conda create -c rdkit -n chemist-env rdkit python=3.6
# optionally: export PATH="/opt/miniconda3/bin:$PATH"
conda activate chemist-env  # or source activate chemist-env with older conda

Install basic requirements with pip:

pip install -r requirements.txt

Kernel-related packages

Certain functionality (some of the graph-based kernels) require the graphkernels package, which can be installed additionally. First, you need to install eigen3, pkg-config: see instructions here:

sudo apt-get install libeigen3-dev; sudo apt-get install pkg-config  # on Linux
brew install eigen; brew install pkg-config  # on MacOS
pip install graphkernels

If the above fails on MacOS (see stackoverflow), the simplest solution is

MACOSX_DEPLOYMENT_TARGET=10.9 pip install graphkernels

To use distance-based kernels, you need Cython and OT distance computers:

pip install Cython
pip install cython POT  # prepended with MACOSX_DEPLOYMENT_TARGET=10.9 if needed

Synthesis Path Plotting Functionality For plotting the synthesis path for an optimal molecule, install graphviz via:

pip install graphviz

However, the above only works on Linux as Homebrew removed the --with-pango option (see this)

Environment

Set PYTHONPATH for imports:

source setup.sh 

Getting data

ChEMBL data as txt can be found in kevinid's repo, official downloads. ZINC database can be downloaded from the official site. Run the following to automatically download the datasets and put them into the right directory:

bash download_data.sh

Running tests

TODO

Running experiments

See experiments/run_chemist.py for the Chemist usage example.

About

DOE framework for joint molecular optimization and synthesis

License:MIT License


Languages

Language:Python 99.0%Language:Shell 1.0%