anaconda docker python regression sharcnet tensorflow time-series-analysis

Seq2Seq_PlouffeRainbows

TensorFlow implementation of a novel open-source Seq2SeqRegression API for performing a wide range of automatic feature extraction tasks outside of NLP. This general purpose Sequence-to-Sequence Regression model can predict a sequence of multidimensional vectors based on previous observations. The system of study being analyzed here is the Plouffe Graph, a graph by Canadian mathematician Simon Plouffe in 1974-1979. More information about the Plouffe Graph can be found here: Times Tables, Mandelbrot and the Heart of Mathematics.

Dataset

The Plouffe dataset is already included. A dataset of multidimensional vectors that represent the Plouffe Graph gets constructed during training. The dataset can be configured easily in the plouffe.yml file inside the configs folder.

IPython Notebook

An IPython Notebook of the Seq2Seq Regression model can be found inside the notebooks folder. This notebook serves to complement the paper and walks you through the computational graph. It also provides a background of the Plouffe Graph dataset.

In order to see the interactive graphics of the Seq2Seq Regression model's predictions, you will need to download this pre-trained model at the Google Drive link,

https://drive.google.com/open?id=0B86gEeQqfnjtMERTV2tjLWMwNnc

Create a logs directory in the root of the Seq2Seq_PlouffeRainbows folder.

After downloading, you need to move/copy the lr0002 folder that was downloaded from the Google Drive link into the logs folder.

Launch IPython Notebook

cd notebooks
jupyter notebook

Note: The iopoub rate limits are too low by default, for this visualization heavy project. To fix this, you can launch the IPython notebook the following way:

jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000

Installation

The program requires the following dependencies (easy to install using pip, Anaconda or Docker):

python 2.7
tensorflow API (tested with r1.0.0)
numpy
scipy
pandas
matplotlib
jupyter
networkx
tqdm
pyyaml
jupyterthemes
seaborn

Anaconda

Anaconda: Installation

To install DLFractalSequences in an Anaconda environment:

conda env create -f environment.yml

To activate Anaconda environment:

source activate dlfractals-env

Anaconda: Train

Train Seq2Seq Regression model on the local machine using the Plouffe dataset:

python train.py -c configs/plouffe.yml

Note: The training inputs (i.e. dataset parameters, hyperparameters etc.) for training on a local machine can be modified in the plouffe.yml inside the configs folder.

Docker

Docker: Installation

Prerequisites: Docker installed on your machine. If you don't have docker installed already, then go here to Docker Setup

To build Docker image:

docker build -t dlfractals:latest .

Docker: Train

To deploy and train on Docker container:

docker run -it dlfractals:latest python train.py -c configs/plouffe.yml

Sharcnet

The Shared Hierarchical Academic Research Computing Network (SHARCNET) is used when you want to run multiple jobs.

Activate Tensorflow Python2.7 environment:

source /opt/sharcnet/testing/tensorflow/tensorflow-cp27-active

Note: If there is anything missing, then do:

pip install <missing_pkg> --user

Example:

pip install /opt/sharcnet/testing/tensorflow/tensorflow-1.0.0-cp27-cp27m-linux_x86_64.whl --user

Train multiple jobs using the Seq2Seq Regression model on the Plouffe dataset:

python train_manyjobs.py -c configs/plouffe_sharcnet.yml

Note: The training inputs (i.e. dataset parameters, hyperparameters etc.) for training on a sharcnet machine can be modified in the plouffe.yml inside the configs folder. You must specify train option inside the YAML config file to be either copper or local when training on sharcnet.

Future Work

Perform futher analysis on the Plouffe Graph. We particularly want to analyze how arithmetic in embedding space corresponds to the group arithmetic in input space, and establish strong baselines in relation to that.
Add libraries that allow more experimentation with attention and external memory.
Explore more datasets (i.e. video sequences) which would leverage the automatic feature extraction functionality of the Seq2Seq Regression model.

About

This repository contains sequence-to-sequence (seq2seq) code in Tensorflow that can be used as a tool for non-linear regression analysis or more specifically time-series analysis where there are no class-labels.