FIDDLE

An integrative deep learning framework for functional genomic data inference.

A project from the Churchman Lab, Harvard Medical School Department of Genetics.

Based on: [http://biorxiv.org/content/early/2016/10/17/081380.full.pdf]

Ongoing:

Generalized data preparation pipeline
GUI interface

Installation and Quick Start

The quick start can be done on a local machine, an HPC environment is more desirable however.

1. Set up FIDDLE environment:

NOTE: Requires python 2.7 and pip. Anaconda can be a nuisance, make sure to comment out any "export PATH"s to Anaconda in your ~/.bash_profile or ~/.bashrc and then re-source it (or even restart current terminal session):

a) Install Python package manager pip:

$ sudo easy_install pip

b) Install isolated Python environments:

$ sudo pip install virtualenv

c) Clone this repository to an appropriate location (for instance ~/Desktop):

$ git clone https://github.com/ueser/FIDDLE.git

d) Instantiate FIDDLE virtual environment, source it:

$ sudo virtualenv venvFIDDLE
$ source venvFIDDLE/bin/activate

e) Install necessary Python packages to FIDDLE virtual environment:

$ pip install -r requirements.txt

2. Download training/validation/test datasets:

a) Create data directory:

$ cd FIDDLE/
$ mkdir -p data/hdf5datasets/

b) Download quickstart datasets:

Place the following datasets in /FIDDLE/data/hdf5datasets/

WARNING: several gb of data

training.h5

validation.h5

test.h5

3) Run FIDDLE

$ cd fiddle

Documentation Interlude

There are two (of many) methods to examine FIDDLE's internal documentation and docstrings:

a) Instantiating a Python session and using the help() function:

$ python
>>> import main # or any other FIDDLE Python script
>>> help(main)

b) Employing the --help (or -h) flag (only shows information about flags):

$ python main.py --help

$ python main.py

4) Create visualization of training:

$ python visualization.py

5) Create representations and predictions datasets:

$ python analysis.py

6) Examine training trajectory:

Change directories to FIDDLE/results/ < --runName (default = experiment) > /. The training trajectory visualization files (.png and .gif) are found in this directory. The representations and predictions created in step 5 are found in the hdf5 files "representations.h5" and "predictions.h5".

7) Plot results:

Change directories to FIDDLE/fiddle and instantiate a jupter notebook session, start up the 'predictions_visualization.ipynb' and follow the instructions outlined in the Markdown cells.

To download Jupyter Notebook, start here: http://jupyter.readthedocs.io/en/latest/install.html.

$ jupyter notebook

Input File Details:

For more complete instructions on file types and FIDDLE's work flow, open up the 'guide.ipynb' jupyter notebook.

$ cd FIDDLE/fiddle
$ jupyter notebook

HMS Orchestra HPC Instructions:

1) Start interactive session, enter FIDDLE directory:

$ bsub -Is -q interactive bash
$ cd FIDDLE/

2) Load correct Tensorflow module

$ module load dev/tensorflow/1.0-GPU

3) Set up virtual environment

Orchestra's Tensorflow module does not play nice with virtual environments, the module above must be loaded before instantiating and then sourcing a virtual environment. More here: https://wiki.med.harvard.edu/Orchestra/PersonalPythonPackages

a) Instantiate, then source the virtual environment:

$ virtualenv venvFIDDLE --system-site-packages
$ source venvFIDDLE/bin/activate

b) Comment out the 'tensorflow==1.0.1' line in the requirements.txt file:

$ vim requirements.txt
tensorflow==1.0.1 --> #tensorflow==1.0.1

c) pip install remaining requirements:

$ pip install -r requirements.txt

4) Put those dwindling GPUs on blast:

A template for submission lies in FIDDLE/fiddle/, modify accordingly. More on GPU usage here: https://wiki.med.harvard.edu/Orchestra/OrchestraNvidiaGPUs.

$ vim orchestra_job_submit.sh
$ bash orchestra_job_submit.sh

ueser / FIDDLE