Active Learning Systems for Screening Drugs

City University of Hong Kong

Final Year Project 2021-22

Supervisor: Dr.WEI, Ying

Description

This project is an Active Learning Framework for Drug Discovery which has been designed to be extendable and permits the integration of different Regression models for determing the potency of chemcial compounds.

Prerequisites

Python Environment Manager such as conda or miniconda
Code Editor - Visual Studio Code or Jupyter-Lab preferred

Usage

Clone repo into the selected folder via

git clone https://gitlab.com/Baldur10/drug-discovery-al

Enter the root folder of the repo via
```
cd \drug-discovery-al\
```

Set up the conda environment by

conda create --name dd-al --file=environ_al.yml

Activate the conda environment via
```
conda activate dd-al
```

ML Models Available

Gaussian Processes Regressor (Scikit-Learn GPR)
Random Forest Regressor (Intel(R) Extension for Scikit-Learn and Scikit-Learn RFR)
Neural Network Regressor (SKORCH)

Pretrained Models for the default assays are available at:

Storage	Link
Onedrive	FYP Models

ML Loops

Training Loop

Open the requisite model training scripts inside /scripts
Taking the example of the Gaussian Processes Regressor Model, the approriater file is /scripts/test_gpr.ipynb
Open the file in the code editor and run all cells. If given the option, select dd-al as the python interpretator
The variable assay_limit can be changed to any integer 'n' to set the first 'n' number of assays for which models have to be trained.
After the training loop is completed, the models can be found under /models and the data is present under /data/data_results

Testing Loop

Before running Flask, set the environment variable using
```
set FLASK_APP=app.py
```
Run the Flask app via
```
flask run
```

Support

Contact me at rmohan2-c@my.cityu.edu.hk

Baldur10 / drug-discovery-al