ueser / FIDDLE

Flexible Integration of Data with Deep LEarning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FIDDLE

An integrative deep learning framework for functional genomic data inference.

A project from the Churchman Lab, Harvard Medical School Department of Genetics.

Based on: [http://biorxiv.org/content/early/2016/10/17/081380.full.pdf]

Ongoing:

  1. Generalized data preparation pipeline
  2. GUI interface

On this page:

  1. Installation and Quick Start
  2. Input File Details
  3. HMS Orchestra HPC Instructions

alt text

Installation and Quick Start

The quick start can be done on a local machine, an HPC environment is more desirable however.

1. Set up FIDDLE environment:

NOTE: Requires python 2.7 and pip. Anaconda can be a nuisance, make sure to comment out any "export PATH"s to Anaconda in your ~/.bash_profile or ~/.bashrc and then re-source it (or even restart current terminal session):

a) Install Python package manager pip:
$ sudo easy_install pip 
b) Install isolated Python environments:
$ sudo pip install virtualenv
c) Clone this repository to an appropriate location (for instance ~/Desktop):
$ git clone https://github.com/ueser/FIDDLE.git 
d) Instantiate FIDDLE virtual environment, source it:
$ sudo virtualenv venvFIDDLE
$ source venvFIDDLE/bin/activate
e) Install necessary Python packages to FIDDLE virtual environment:
$ pip install -r requirements.txt
2. Download training/validation/test datasets:
a) Create data directory:
$ cd FIDDLE/
$ mkdir -p data/hdf5datasets/
b) Download quickstart datasets:

Place the following datasets in /FIDDLE/data/hdf5datasets/

WARNING: several gb of data

training.h5

validation.h5

test.h5

3) Run FIDDLE
$ cd fiddle

Documentation Interlude

There are two (of many) methods to examine FIDDLE's internal documentation and docstrings:

a) Instantiating a Python session and using the help() function:
$ python
>>> import main # or any other FIDDLE Python script
>>> help(main)
b) Employing the --help (or -h) flag (only shows information about flags):
$ python main.py --help

$ python main.py
4) Create visualization of training:
$ python visualization.py
5) Create representations and predictions datasets:
$ python analysis.py
6) Examine training trajectory:

Change directories to FIDDLE/results/ < --runName (default = experiment) > /. The training trajectory visualization files (.png and .gif) are found in this directory. The representations and predictions created in step 5 are found in the hdf5 files "representations.h5" and "predictions.h5".

7) Plot results:

Change directories to FIDDLE/fiddle and instantiate a jupter notebook session, start up the 'predictions_visualization.ipynb' and follow the instructions outlined in the Markdown cells.

To download Jupyter Notebook, start here: http://jupyter.readthedocs.io/en/latest/install.html.

$ jupyter notebook

Input File Details:

For more complete instructions on file types and FIDDLE's work flow, open up the 'guide.ipynb' jupyter notebook.

$ cd FIDDLE/fiddle
$ jupyter notebook

HMS Orchestra HPC Instructions:

1) Start interactive session, enter FIDDLE directory:
$ bsub -Is -q interactive bash
$ cd FIDDLE/
2) Load correct Tensorflow module
$ module load dev/tensorflow/1.0-GPU
3) Set up virtual environment

Orchestra's Tensorflow module does not play nice with virtual environments, the module above must be loaded before instantiating and then sourcing a virtual environment. More here: https://wiki.med.harvard.edu/Orchestra/PersonalPythonPackages

a) Instantiate, then source the virtual environment:
$ virtualenv venvFIDDLE --system-site-packages
$ source venvFIDDLE/bin/activate
b) Comment out the 'tensorflow==1.0.1' line in the requirements.txt file:
$ vim requirements.txt
tensorflow==1.0.1 --> #tensorflow==1.0.1
c) pip install remaining requirements:
$ pip install -r requirements.txt
4) Put those dwindling GPUs on blast:

A template for submission lies in FIDDLE/fiddle/, modify accordingly. More on GPU usage here: https://wiki.med.harvard.edu/Orchestra/OrchestraNvidiaGPUs.

$ vim orchestra_job_submit.sh
$ bash orchestra_job_submit.sh

About

Flexible Integration of Data with Deep LEarning

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 58.0%Language:Python 32.3%Language:Lua 8.4%Language:Shell 1.2%