drug-discovery healthcare deep-learning medical

Drug-Discovery

Intro

When developing new medicines it is important to identify molecules that are highly active toward their intended targets but not toward other targets that might cause side effects. The objective is to identify the best statistical techniques for predicting biological activities of different molecules, both on- and off-target, given numerical descriptors generated from their chemical structures.

Data Description

The data is based on 14 target molecules and over 10,000 compounds for each target. For each target molecule, each row of the data corresponds to a compound and contains descriptors derived from that compound’s chemical structure. Activity between the target molecule and each compound is provided in the training data and is the target for prediction in the test data.

Dataset

http://pubs.acs.org/doi/suppl/10.1021/ci500747n/suppl_file/ci500747n_si_002.zip

Instructions

Set data_root and save_root variables in data_preprocessing.py and run it
Point the data_root in main.py to where the pre-processed training and test files are located
Run main.py

For Further Reference:

http://www.cs.toronto.edu/~gdahl/papers/deepQSARJChemInfModel2015.pdf

About

drug-discovery healthcare deep-learning medical

Languages

Language:Python 100.0%