theanoDemo

Demo of theano on tinis

Enable python (every session) with one of the following two lines. You cannot have both. You might want to add it to your .bashrc file.

module load intel impi Python CUDA #for Python 2.7
module load intel impi Python/3.5.1 CUDA # for Python 3.5

Download Theano with the following command, which should also give you numpy and scipy. You might prefer to call the command as pip2.7 or pip3.5 to be sure you get the right version. You don't want to use python2.6 or pip2.6, which python and pip link to before any other python has been loaded.

pip install --user Theano

I recommend making a modification inside theano's source to stop it trying to figure out if it can use MPI. This check process behaves strangely (at least for me) on the login node and main calculation nodes of tinis. It doesn't really matter if you only ever try to use the GPU nodes. In the file ~/.local/lib/python3.5/site-packages/theano/tensor/io.py (replace 3.5 with 2.7 if you need to) you'll find a block like this about 100 lines down:

try:
    from mpi4py import MPI
except ImportError:

Modify this block so the import always fails, e.g. by replacing the import statement with

    from mpi4py import MPI_WELOVETINIS

For testing CUDA, it may be useful to install pycuda as follows. First try to install it on the login node:

pip install --user pycuda

This will fail (won't find -libcuda, and maybe won't find <cuda.h> etc either) but succeed in installing the dependencies. Then download pycuda (you can't download on the GPU node)

mkdir foo
cd foo
pip download pycuda

Then on a gpu node try

module load intel impi Python/3.5.1 CUDA
export CPATH=$CPATH:/csc/tinis/software/Core/CUDA/7.5.18/include/ #not sure if needed
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/csc/tinis/software/Core/CUDA/7.5.18/lib64 #not sure if needed
cd foo
pip install --user pycuda<tab>

#Using icpc

icpc is the recommended C++ compiler on tinis. It seems to work OK with theano.

There are warnings about cxxflags meaning that optimizations are not properly turned on. Basically theano doesn't rely on -march=native because it wants to cache binaries on the filesystem between uses, keyed by i.a. the compiler flags, and you might use more than one system setup with the same shared working directory. But theano can't correctly detect the native flags with icpc. You can add gcc.cxxflags=-march=native to your theano flags to turn off this behaviour without losing the goodness of -march=native (I haven't tested, but in practice, this goodness may not add up to much on our 64 bit architecture anyway), but you might want to be careful about the cache if you, e.g., are doing calculations both on the cnode and gpu queues. Doing this comes with some warnings, which you might want to disable.

To do that, you'd proceed as follows. Edit the file ~/.local/lib/python3.5/site-packages/theano/gof/cmodule.py (replace 3.5 with 2.7 if you need to). Add the words detect_march and somewhere around line 1820, so it goes from

        if ('g++' not in theano.config.cxx and
                'clang++' not in theano.config.cxx and
                'clang-omp++' not in theano.config.cxx):

        if (detect_march and 'g++' not in theano.config.cxx and
                'clang++' not in theano.config.cxx and
                'clang-omp++' not in theano.config.cxx):

(on current dev theano, but not any release version, this will look slightly different. There's an extra line about icpc in this if condition which looks spurious to me. As of 18 August 2016, dev theano doesn't detect icpc architecture flags correctly.)

About 9 lines up from this is the main place where the warning telling us not to do what we're doing appears. Change

                    _logger.warn(

                    _logger.info(

to make the message which ends but we don't do it now quieter.

I might set something like

os.environ["THEANO_FLAGS"]="floatX=float32,dnn.enabled=False,device=gpu0,cxx=icpc,gcc.cxxflags=-march=native"

cpuIntel.py demonstrates using icpc without GPUs.

There is a choice between using a full node and msub, where you control which GPUs you access with CUDA_VISIBLE_DEVICES, and a method I don't understand yet with srun and sbatch where it's automatic.

The following should work with GPUs: module load intel impi Python/3.5.1 CUDA export CUDA_VISIBLE_DEVICES=0,1,2,3 python gpu.py

#Using GCC Important: The following and most of the rest of this repository documents some attempts to make theano work on tinis with the g++ compiler. The main difficulty is the need to wait after g++ returns before using its binary output, which is an unusual problem.

You might need to load a module for binutils as well as for GCC.

We then hackishly edit Theano to make it wait after ever calling the compiler. Add the following line to the beginning (after the doc) of the function dlimport in the file ~/.local/lib/python2.7/site-packages/theano/gof/cmodule.py

   time.sleep(5)

In the current theano (0.8) this will be line 266. Calling compile() on a keras model seems to call this function about 160 times, so the delay is about 13 minutes.

Suggested alternative When g++ writes to standard output, it doesn't seem to have the timing problem. For example the following works

rm -f a.out && g++ a.cpp -o- > a.out && chmod +x a.out && ./a.out

I would think it possible to change cmodule.py so that it always collected its g++ output from standard output.

Up till this point is just setup, and I think it will work either on the head node or on the gpu node. Communicating with github etc. is easier on the head node.

So you want to be in a gpu node session, so type

msub -I -q gpu -l walltime=08:00:00

(I suggest creating an alias or script for this, I do it often.)

To run the demonstration of theano type

python gpu.py

This does almost no work but demonstrates theano working with a gpu. Note that to choose the gpu to use, you have two controls - the gpu0 in THEANO_CONFIG tells theano to use the first GPU, which means the gpu listed first in CUDA_VISIBLE_DEVICES.

bottler / theanoDemo

theanoDemo

About

Languages