Fixed some issues with GPU Dockerfile

Question

Fixed some issues with GPU Dockerfile

pbamotra opened this issue 7 years ago · comments

Fixes: cuDNN6, libcudnn.so.6 issue, #59, latest version of deep learning libraries, pandas, sklearn upgrade, added some of my favorite python libraries as well

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04

LABEL authors="Sai Soundararaj <saip@outlook.com>, Pankesh Bamotra <pankesh.bamotra@gmail.com>"

ARG THEANO_VERSION=rel-0.9.0
ARG TENSORFLOW_VERSION=1.3.0
ARG TENSORFLOW_ARCH=gpu
ARG KERAS_VERSION=2.0.8
ARG LASAGNE_VERSION=v0.1
ARG TORCH_VERSION=latest
ARG CAFFE_VERSION=master
ARG CUDNN_TAR_FILE=cudnn-8.0-linux-x64-v6.0.tgz

#RUN echo -e "\n**********************\nNVIDIA Driver Version\n**********************\n" && \
#	cat /proc/driver/nvidia/version && \
#	echo -e "\n**********************\nCUDA Version\n**********************\n" && \
#	nvcc -V && \
#	echo -e "\n\nBuilding your Deep Learning Docker Image...\n"

# Install some dependencies
RUN apt-get update && apt-get install -y \
		bc \
		build-essential \
		cmake \
		curl \
		g++ \
		gfortran \
		git \
		libffi-dev \
		libfreetype6-dev \
		libhdf5-dev \
		libjpeg-dev \
		liblcms2-dev \
		libopenblas-dev \
		liblapack-dev \
		libopenjpeg2 \
		libpng12-dev \
		libssl-dev \
		libtiff5-dev \
		libwebp-dev \
		libzmq3-dev \
		nano \
		pkg-config \
		python-dev \
		software-properties-common \
		unzip \
		vim \
		wget \
		zlib1g-dev \
		qt5-default \
		libvtk6-dev \
		zlib1g-dev \
		libjpeg-dev \
		libwebp-dev \
		libpng-dev \
		libtiff5-dev \
		libjasper-dev \
		libopenexr-dev \
		libgdal-dev \
		libdc1394-22-dev \
		libavcodec-dev \
		libavformat-dev \
		libswscale-dev \
		libtheora-dev \
		libvorbis-dev \
		libxvidcore-dev \
		libx264-dev \
		yasm \
		libopencore-amrnb-dev \
		libopencore-amrwb-dev \
		libv4l-dev \
		libxine2-dev \
		libtbb-dev \
		libeigen3-dev \
		python-dev \
		python-tk \
		python-numpy \
		python3-dev \
		python3-tk \
		python3-numpy \
		ant \
		default-jdk \
		doxygen \
		&& \
	apt-get clean && \
	apt-get autoremove && \
	rm -rf /var/lib/apt/lists/* && \
# Link BLAS library to use OpenBLAS using the alternatives mechanism (https://www.scipy.org/scipylib/building/linux.html#debian-ubuntu)
	update-alternatives --set libblas.so.3 /usr/lib/openblas-base/libblas.so.3

# Install cuDNN v6.0
RUN wget http://developer.download.nvidia.com/compute/redist/cudnn/v6.0/${CUDNN_TAR_FILE} -P /root/downloads && \
    cd /root/downloads && \
    tar -xzvf ${CUDNN_TAR_FILE}
    
ADD cuda/include/cudnn.h /usr/local/cuda-8.0/include
ADD cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64/
RUN chmod a+r /usr/local/cuda-8.0/lib64/libcudnn*

ENV CUDA_HOME=/usr/local/cuda
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 

RUN cd /usr/local/cuda/lib64 && \
    rm libcudnn.so && \
    rm libcudnn.so.6 && \
    ln libcudnn.so.6.* libcudnn.so.6 && \
    ln libcudnn.so.6 libcudnn.so && \
    ldconfig
    
# Install pip
RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
	python get-pip.py && \
	rm get-pip.py

# Add SNI support to Python
RUN pip --no-cache-dir install \
		pyopenssl \
		ndg-httpsclient \
		pyasn1

# Install useful Python packages using apt-get to avoid version incompatibilities with Tensorflow binary
# especially numpy, scipy, skimage and sklearn (see https://github.com/tensorflow/tensorflow/issues/2034)
RUN apt-get update && apt-get install -y \
		python-numpy \
		python-scipy \
		python-nose \
		python-h5py \
		python-skimage \
		python-matplotlib \
		python-pandas \
		python-sklearn \
		python-sympy \
		&& \
	apt-get clean && \
	apt-get autoremove && \
	rm -rf /var/lib/apt/lists/*

# Install other useful Python packages using pip
RUN pip --no-cache-dir install --upgrade ipython pandas sklearn && \
	pip --no-cache-dir install \
		Cython \
        click \
        grequests \
        h5py \
        python-dotenv \
        sqlalchemy-redshift \
        gevent \
        awscli \
		ipykernel \
		jupyter \
		path.py \
		Pillow \
		pygments \
		six \
		sphinx \
		wheel \
		zmq \
		&& \
	python -m ipykernel.kernelspec


# Install TensorFlow
RUN pip --no-cache-dir install \
	https://storage.googleapis.com/tensorflow/linux/${TENSORFLOW_ARCH}/tensorflow_${TENSORFLOW_ARCH}-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl


# Install dependencies for Caffe
RUN apt-get update && apt-get install -y \
		libboost-all-dev \
		libgflags-dev \
		libgoogle-glog-dev \
		libhdf5-serial-dev \
		libleveldb-dev \
		liblmdb-dev \
		libopencv-dev \
		libprotobuf-dev \
		libsnappy-dev \
		protobuf-compiler \
		&& \
	apt-get clean && \
	apt-get autoremove && \
	rm -rf /var/lib/apt/lists/*

# Install Caffe
RUN git clone -b ${CAFFE_VERSION} --depth 1 https://github.com/BVLC/caffe.git /root/caffe && \
	cd /root/caffe && \
	cat python/requirements.txt | xargs -n1 pip install && \
	mkdir build && cd build && \
	cmake -DUSE_CUDNN=1 -DBLAS=Open .. && \
	make -j"$(nproc)" all && \
	make install

# Set up Caffe environment variables
ENV CAFFE_ROOT=/root/caffe
ENV PYCAFFE_ROOT=$CAFFE_ROOT/python
ENV PYTHONPATH=$PYCAFFE_ROOT:$PYTHONPATH \
	PATH=$CAFFE_ROOT/build/tools:$PYCAFFE_ROOT:$PATH

RUN echo "$CAFFE_ROOT/build/lib" >> /etc/ld.so.conf.d/caffe.conf && ldconfig


# Install Theano and set up Theano config (.theanorc) for CUDA and OpenBLAS
RUN pip --no-cache-dir install git+git://github.com/Theano/Theano.git@${THEANO_VERSION} && \
	\
	echo "[global]\ndevice=gpu\nfloatX=float32\noptimizer_including=cudnn\nmode=FAST_RUN \
		\n[lib]\ncnmem=0.95 \
		\n[nvcc]\nfastmath=True \
		\n[blas]\nldflag = -L/usr/lib/openblas-base -lopenblas \
		\n[DebugMode]\ncheck_finite=1" \
	> /root/.theanorc


# Install Keras
RUN pip --no-cache-dir install git+git://github.com/fchollet/keras.git@${KERAS_VERSION}


# Install Lasagne
RUN pip --no-cache-dir install git+git://github.com/Lasagne/Lasagne.git@${LASAGNE_VERSION}


# Install Torch
RUN git clone https://github.com/torch/distro.git /root/torch --recursive && \
	cd /root/torch && \
	bash install-deps && \
	yes no | ./install.sh

# Export the LUA evironment variables manually
ENV LUA_PATH='/root/.luarocks/share/lua/5.1/?.lua;/root/.luarocks/share/lua/5.1/?/init.lua;/root/torch/install/share/lua/5.1/?.lua;/root/torch/install/share/lua/5.1/?/init.lua;./?.lua;/root/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua' \
	LUA_CPATH='/root/.luarocks/lib/lua/5.1/?.so;/root/torch/install/lib/lua/5.1/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so' \
	PATH=/root/torch/install/bin:$PATH \
	LD_LIBRARY_PATH=/root/torch/install/lib:$LD_LIBRARY_PATH \
	DYLD_LIBRARY_PATH=/root/torch/install/lib:$DYLD_LIBRARY_PATH
ENV LUA_CPATH='/root/torch/install/lib/?.so;'$LUA_CPATH

# Install the latest versions of nn, cutorch, cunn, cuDNN bindings and iTorch
RUN luarocks install nn && \
	luarocks install cutorch && \
	luarocks install cunn && \
    luarocks install loadcaffe && \
	\
	cd /root && git clone https://github.com/soumith/cudnn.torch.git && cd cudnn.torch && \
	git checkout R4 && \
	luarocks make && \
	\
	cd /root && git clone https://github.com/facebook/iTorch.git && \
	cd iTorch && \
	luarocks make

# Install OpenCV
RUN git clone --depth 1 https://github.com/opencv/opencv.git /root/opencv && \
	cd /root/opencv && \
	mkdir build && \
	cd build && \
	cmake -DCMAKE_LIBRARY_PATH=/usr/local/cuda/lib64/stubs -DWITH_QT=ON -DWITH_OPENGL=ON -DFORCE_VTK=ON -DWITH_TBB=ON -DWITH_GDAL=ON -DWITH_XINE=ON -DBUILD_EXAMPLES=ON .. && \
	make -j"$(nproc)"  && \
	make install && \
	ldconfig && \
	echo 'ln /dev/null /dev/raw1394' >> ~/.bashrc

# Expose Ports for TensorBoard (6006), Ipython (8888), Flask service(8080)
EXPOSE 6006 8888 8080

WORKDIR "/root"
CMD ["/bin/bash"]

Anastasios Tsiolakidis · Answer 1 · Sun Oct 15 2017 08:40:29 GMT+0800 (China Standard Time)

Is this supposed to work in the repository I just cloned? In addition to the "build" instructions missing a parameter, I entered a dot "." for the current path, I could not get a GPU build, log attached
error.txt

John Ryan · Answer 2 · Sun Oct 22 2017 02:22:21 GMT+0800 (China Standard Time)

@reppolice No Pull Request was made for this code change.
The easy solution is to clone this repo:
git@github.com:Paperspace/dl-docker.git

Anastasios Tsiolakidis · Answer 3 · Sun Oct 22 2017 04:52:16 GMT+0800 (China Standard Time)

@jtryan I cloned that one, but didn't have much success,I don't think it would be a problem on my side, would it?

...
2017-10-21 20:47:49 (1.84 MB/s) - '/root/downloads/cudnn-8.0-linux-x64-v6.0.tgz' saved [201134139/201134139]

cuda/include/cudnn.h
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.6
cuda/lib64/libcudnn.so.6.0.21
cuda/lib64/libcudnn_static.a
---> fb47c42ccca4
Removing intermediate container 871dccc622e8
Step 13/40 : ADD cuda/include/cudnn.h /usr/local/cuda-8.0/include
ADD failed: stat /var/lib/docker/tmp/docker-builder494311789/cuda/include/cudnn.h: no such file or directory

Pankesh Bamotra · Answer 4 · Sun Oct 22 2017 06:54:09 GMT+0800 (China Standard Time)

Try this.
I'm just a beginner with Docker so excuse my hack-y way of doing things.

Dockerfile.gpu.txt

John Ryan · Answer 5 · Sun Oct 22 2017 23:39:08 GMT+0800 (China Standard Time)

@reppolice No I got the same error. @pbamotra change to the Dockerfile.gpu adding this section should work fine. If you replace Dockerfile.gpu from this repo with his file above, anIt is a long build though... 😄

adamjosephjensen · Answer 6 · Thu Nov 30 2017 02:30:51 GMT+0800 (China Standard Time)

Hi, I'm running into the cuDNN6, libcudnn.so.6 issue trying to import tensorflow tf-nightly-gpu==1.5.0-dev20171127.

Would you be willing to provide me with instructions on how to resolve this? I'm not sure what exactly to do with the Dockerfile but I will try to figure it out in the meantime.

Thanks in advance. Aside from this (which is blocking training) using floydhub has been great.

EDIT

I partially resolved the issue with the following setup.sh script:

#!/bin/bash

echo 'PATH is:'
echo $PATH
echo 'LD_LIBRARY_PATH is:'
echo $LD_LIBRARY_PATH
echo "lib64:"
ls /usr/local/cuda/lib64/ | grep "libcudn*"
echo "include: "
ls /usr/local/cuda/include/ | grep "libcudn*"

CUDNN_TAR_FILE=cudnn-8.0-linux-x64-v6.0.tgz

wget http://developer.download.nvidia.com/compute/redist/cudnn/v6.0/${CUDNN_TAR_FILE} -P /root/downloads && \
cd /root/downloads && \
tar -xzvf ${CUDNN_TAR_FILE} && \
cp cuda/include/cudnn.h /usr/local/cuda-8.0/include && \
cp cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64/ && \
chmod a+r /usr/local/cuda-8.0/lib64/libcudnn*

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64

cd /usr/local/cuda/lib64 && \
rm libcudnn.so && \
rm libcudnn.so.6 && \
ln libcudnn.so.6.* libcudnn.so.6 && \
ln libcudnn.so.6 libcudnn.so && \
ldconfig

echo 'PATH is:'
echo $PATH
echo 'LD_LIBRARY_PATH is:'
echo $LD_LIBRARY_PATH
echo "lib64:"
ls /usr/local/cuda/lib64/ | grep "libcudn*"
echo "include: "
ls /usr/local/cuda/include/ | grep "libcudn*"

#this may not be necessary or useful
pip3 install cudnn-python-wrappers

pip3 install tf-nightly-gpu

However, that does mean I need to download the 65 MB Nvidia driver on every build. Obviously this is less than ideal. Would Floydhub be willing to fix this?