- About
- Prerequisites
- CNN classifier
- DCGAN with Keras and Tensorflow
- Jupyter Notebooks on FloydHub
- Summary
Training Neural Networks can take a very long time on a standard CPU. It is therefore recommended to use GPU powered machines. This can be achieved by running your code on some virtual machines in AWS or by using some of the new services like FloydHub or Valohai
For development purpose, I'd like to use my local ecosystem and only want to use the GPU powered (and chargeable) services when doing the final training. Nevertheless, the code should be more or less the same locally and remote and the handling of using external services has to be very easy.
Bases on a CNN classifier from Aymeric Damien and the more complex DCGAN Implementation with Tensorflow and Keras from Rowel Atienza I try to give some ideas how to run the same code on the local machine and on FloydHub which has GPU powered docker containers.
All the code here was only tested on a Mac running Mac OS X. A local version of Python needs to be installed. Anaconda is a good choice. As the many examples for MachineLearning and DeepLearning have different requirements (like python 2 or 3) I use conda environments to isolate them for the projects.
There is an environment.yml
file in this repository
to install the packages, run conda env create -f environment.yml
. This should install the following packages with its dependencies.
python 3.5
numpy
matplotlib
keras 2.0.2
tensorflow 1.0.1
After all dependencies are installed the environment can be activated with source activate dl_playground
to run the code on a GPU powered system, without messing with virtual machines on AWS, I recently found FloydHub to be a nice alternative.
- Get an account on FloydHub. Currently it comes with 100 hours of free GPU usage.
- Install the command line client
pip install -U floyd-cli
- Login with
floyd login
Currently there is only a tensorflow-1.0 docker container with Keras 1.2.2. The used code examples are written for Keras 2.0.
The additional required package keras==2.0.2
can be added to the floyd_requirements.txt file. All the listed packages are installed in the docker container when the job gets executed.
The tensorflow based CNN classifier on MNIST from Aymeric Damien is a good place to start with, as it is a clean and straightforward implementation. Also it takes about 5 minutes when executed locally on a standard CPU.
Get the convolutional_network.py by either repo clone, raw download or
curl -O https://raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/examples/3_NeuralNetworks/convolutional_network.py
run the code
python convolutional_network.py
Depending on the machine it takes a few minutes.
The idea is now to run the exact same code on a GPU powered system on FloydHub. The interaction is done via the floyd
command line interface and/or via the web interface.
- First initialize a new project on FloydHub by running
floyd init dl_playground
- then submit the code to floyd with
floyd run --gpu --env tensorflow-1.0 "python convolutional_network.py"
The code is the uploaded to FloydHub, a Docker container is started in which the code is executed with the power of some GPU. The progress can be seen online or via the command line tool. See the documentation for more information.
The first run takes a bit longer (~ 3 minutes) as it initiates the docker image (I think). The execution time for the CNN code is about 37s.
Now to something more complex with a longer execution time. Rowel Atienza wrote a nice blog post explaining Generative Adversarial Networks (GAN) accompanied by a Deep Convolution implementation with Tensorflow and Keras.
the original code does not run "as is" on FloydHub for 2 reasons:
There is no Display
generating plots with matplotlib
does require a $DISPLAY
to generate plots. To be used on a pure server environment the matplotlib backend has to be changed:
import matplotlib
matplotlib.use('Agg')
FloydHub has a predefined output directory
All output on FloydHub has to be written into the predefined directory /output
Therefor a command line parameter --out-pathis added with a default value to
.`
Get the implementation code.
wget https://raw.githubusercontent.com/roatienza/Deep-Learning-Experiments/master/Experiments/Tensorflow/GAN/dcgan_mnist.py
run the code
python convolutional_network.py
I ran the code with 100 training steps instead of 10000 and it took 32 minutes. So, it would take about 53 hours with 10000 steps.
initialize a new project if not already done and submit the code to floyd:
floyd init dl_playground
floyd run --env tensorflow-1.0 --gpu "python dcgan_mnist.py --out-path=/output"
runtime on FloydHub with 10000 training steps: ~2 hours
Also worth to look into is the ability to run jupyter notebooks on FloydHub. For example you can run the dl_course notebooks on a GPU powered system:
git clone https://github.com/tensorchiefs/dl_course.git
cd dl_course
echo "keras==2.0.2" > floyd_requirements.txt
floyd init dl_course
floyd run --env tensorflow-1.0 --gpu --mode jupyter
Be careful. The Notebook Server has the be stopped manually, otherwise it can become expensive. For small models the notebook server can also be started on a standard CPU.
With small changes, existing code examples can be executed on GPU powered systems. When developing something new, the design adjustments are small to be able to run on FloydHub and the performance improvements are definitely worth it.