-----------------

TensorFlow with MPI

This repository contains a patched version of TensorFlow 0.12.1 which includes the tensorflow.contrib.mpi namespace with MPI operations, including a potentially CUDA-aware ring allreduce.

Installation

Using this requires building TensorFlow from source with a CUDA-aware MPI of your choice, and has been tested with OpenMPI integrated with SLURM.

Install by following the TensorFlow source installation instructions. When you run configure, you will be prompted for whether you would like to build TensorFlow with MPI, and, if so, what path your MPI installation is at.

Although it has only been tested with SLURM-integrated OpenMPI, it should also work with any other CUDA-aware MPI implementation.

Usage

The auto-generated documentation for TensorFlow includes usage examples. In addition, we include a TensorFlow language model that we use for benchmarking the allreduce in a real-world situation. In order to run the language model training, make sure you pip install -r allreduce-requirements.txt to install all Python dependencies.

After that, you should be able to run allreduce-test.py with the appropriate training and validation datasets and vocabulary. We train on the Billion Words dataset, which is a text file with one sentence per line, as follows:

...
To Mo concerning the food log you kept -- Dr. Buchholz recommends the same thing .
The CBO estimates that only 23 percent of that would be spent in 2009 and 2010 .
Even so , Democrats slammed Bush as out of touch .
An information campaign will be launched later to raise awareness of employment rights and how to enforce them .
...

The vocabulary file is a list of the top most common vocabulary words:

<unk>
the
,
.
to
of
and
a
in
"
's
that
for
on
is
The
was
with
said
as
at
...

You should be able to run training with a command as follows:

# If you have SLURM with a CUDA-aware MPI integrated, you can use `srun` to
# launch your job. Otherwise, you will need to use `mpirun` and appropriately
# set `CUDA_VISIBLE_DEVICES` to choose which GPUs to use.
srun --partition=K40x4 --ntasks=4 --gres=gpu:4 \
    python allreduce-test.py \
        --train-data train.txt \
        --validation-data train.txt \
        --vocab vocab.txt \
        --vocab-size 10000 \
        --batch-size 32 \
        --max-iterations 10000

Support

We do not offer any sort of official support or maintenance for this patch. However, if you would like to use it and run into trouble, feel free to file a Github issue and we may be able to help.

About

Apache License 2.0

Languages

Language:C++ 44.3%Language:Python 43.1%Language:Jupyter Notebook 5.8%Language:TypeScript 2.4%Language:HTML 1.7%Language:Shell 0.9%Language:Protocol Buffer 0.5%Language:CMake 0.3%Language:C 0.3%Language:Go 0.3%Language:Objective-C++ 0.2%Language:Java 0.2%Language:Makefile 0.1%Language:JavaScript 0.0%Language:Objective-C 0.0%Language:Batchfile 0.0%Language:CSS 0.0%