i-shah / ml-organ-tox

Data-driven chemical-induced toxicity prediction by machine learning using chemical and bioactivity data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ml-organ-tox

Can we use a data-driven approach to predict the target-organ toxicity of chemicals in repeat dose animal testing studies ? This project frame chemical toxicity prediction as a supervised machine learning (ML) problem. Toxicity is assumed to be a categorical outcome, which is defined by histopathology data from legacy repeat-dose animal testing experiments. Chemicals that cause/do not cause histopathological effects in an organ are treated as positive/negative examples of each class.

We use mulitple machine learning algorithms to build classifiers for each target organ class using different types of descriptors for chemicals including: (i) chemical - these are standardised representations derived from the molecular graph, (ii) bioactivity descriptors - these are derived from high-throughput screening experimental data, and (iii) hybrid descriptors - formed by a combination of chemical and bioactivity descriptors. Using cross-validation testing to evaluate performance, the impact of descriptor type, number of descriptors, number of +/- examples, and machine learning algorithm on predicting target organ outcomes was systematically evaluated. The algorithm for the workflow is given below:-

Installation

The entire analysis is implemented using open source tools. Either clone or download this repository to get started ...

Operating system

This system has only been tested under linux: Red Hat Enterprise Linux v6/v7 and Ubuntu Server 16.04 LTS. It should be possible to run it under macOS and possibly Windows (if you can install the following requirements).

MongoDB

Unless already available, install the latest version of mongodb. MongoDB is a document-oriented database that is quite handy for storing complex data structures here's an introduction to MongoDB. Read the tutorial on mongodb security and confirm that authentication switched on (edit the /etc/mongod.conf add "authorization:enabled" under security, then restart mongodb).

Create the mongo database

Login to mongodb as the administrative user and create a database by the name "organtox_v1". Create an account that will be used by the code to access the organtox_v1 database. Currently, the username/password are set to devel/devel but these can be changed (make sure the code database connection code is changed in the jupyter notebooks - see below).

python> DB = openMongo(host='pb.epa.gov',user='devel',passwd='devel',db='organtox_v1')

Download the mongodb files

The data required for this analysis is available as a mongodump file and it must be downloaded via ftp. Please note: this is a large file and it will take a long time to download. Untar this file using the following commmand (will need bunzip2 decompression):-

unix> tar jxvf mongodb_organtox_v1.tbz

This will create the directory: organtox_v1 with the following contents:

organtox_v1/tox_fp.bson
organtox_v1/bio_fp.bson
organtox_v1/tox_fp.metadata.json
organtox_v1/ml_run_v1.bson
organtox_v1/ml_run_v1.metadata.json
organtox_v1/bio_fp.metadata.json
organtox_v1/ml_summary_v1.metadata.json
organtox_v1/ml_summary_v1.bson
organtox_v1/ml_lr_v1.metadata.json
organtox_v1/chm_fp.metadata.json
organtox_v1/chm_fp.bson
organtox_v1/ml_lr_v1.bson

Restore the organtox_v1 database from the downloaded files

The following unix command (see documentation on mongorestore) loads the contents of the data/mongodump directory into the htsdb database (by user=devel with password = devel):-

unix> mongorestore -u devel -p devel -d organtox_v1 organtox_v1

Test out the mongodb installation by connecting to the db using the mongo command line client.

unix> mongo -u devel -p devel localhost/organtox_v1

Python

This code has been tested on Python 2.7. The easiest way to install Python 2.7 is via Anaconda. Install the python packages given in lib/requirements.txt as follows:

unix> pip install -r lib/requirements.txt

Jupyter

Jupyter notebook is an interactive computing environment based and is freely available. It can be installed using Anaconda. After jupyter is installed read the quickstart instructions to create a configuration file. Set the following variables in your jupyter_notebook_config.py:

c.NotebookApp.port = 7777

Enter the notebooks directort and start the notebook server from the command line using (I always run this in a screen):

unix> jupyter notebook

Testing the system

After you have completed the above steps open this page (if jupyter is running on port 7777) in your browser to run different steps of the analysis. The machine learning analysis given in notebooks/organ-tox-ml.ipynb, the analysis of different factors in machine learning on F1 performance is given in notebooks/organ-tox-stats.ipynb, and the code for reproducing all figures / tables is given in notebooks/organ-tox-figs.ipynb.

About

Data-driven chemical-induced toxicity prediction by machine learning using chemical and bioactivity data

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 99.8%Language:Python 0.2%