TF 1.12.0, CPU/GPU, CUDA 9.0, CuDNN 7.4, Python 3.5, Ubuntu 16.04, Skylake, -AVX, +SSE4

Question

TF 1.12.0, CPU/GPU, CUDA 9.0, CuDNN 7.4, Python 3.5, Ubuntu 16.04, Skylake, -AVX, +SSE4

bzamecnik opened this issue 5 years ago · comments

Recent build with and without GPU, without AVX for Ubuntu 16.04.

CPU/GPU	AVX/AVX2/FMA	SSE4.1/SSE4.2	link	md5
GPU	no	yes	download	5d9fb5aee87456d5c0f1915b16844769
CPU	no	yes	download	df616627cfcbe47d77df7f8628611b9e
GPU	no	no	download	61e1081971626bccc047ea773a0f2eed

Compiled on Intel Pentium G4400 (Skylake) without AVX/FMA instructions.

TensorFlow 1.12.0 GPU
Ubuntu 16.04 (libc6-2.23)
Python 3.5
CUDA 9.0
CuDNN 7.4
NCCL 1.3
Bazel 0.19.2
compute capabilities 5.2 (Maxwell), 6.1 (Pascal)

Successfully tested with Keras 2.2.4 on GTX 980 Ti.

Instructions

Install the Bazel build tool:

* For TF 1.12 use Bazel 0.19.1 (recommended 0.15.0)
open https://docs.bazel.build/versions/master/install-ubuntu.html
# use binary installer (recommended)
sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python
# download from https://github.com/bazelbuild/bazel/releases (~ 160 MB)
BAZEL_VERSION=0.19.2
BAZEL_INSTALLER=bazel-$BAZEL_VERSION-installer-linux-x86_64.sh
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/${BAZEL_INSTALLER}
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/${BAZEL_INSTALLER}.sha256
shasum -a 256 -c -b ${BAZEL_INSTALLER}.sha256 
# bazel-0.19.2-installer-linux-x86_64.sh: OK
chmod +x ${BAZEL_INSTALLER}
./${BAZEL_INSTALLER} --user
# set in ~/.bashrc
export PATH="$PATH:$HOME/bin"
 source /home/bza/.bazel/bin/bazel-complete.bash
# check it
bazel version

Build:

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow/
git checkout v1.12.0
# install deps
# Python 3.5 on Ubuntu 16.04
sudo apt install python3-dev python3-pip
# NVIDIA - given the ML deb repo
sudo apt remove libcudnn6 libcudnn6-dev
sudo apt install libcudnn7 libcudnn7-dev
/usr/lib/x86_64-linux-gnu/
# python virtual env + dependencies
mkvirtualenv tf-build
pip install -U pip six numpy wheel mock
pip install -U keras_applications==1.0.6 --no-deps
pip install -U keras_preprocessing==1.0.5 --no-deps
# not in the guide, but missing in the tests
pip install -U scipy scikit-learn portpicker
./configure
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/lib/x86_64-linux-gnu
# Do you wish to build TensorFlow with CUDA support? [y/N]: y
# Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]:
# Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 
# Do you wish to build TensorFlow with TensorRT support? [y/N]: n
# Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 1.3
# Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
# Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.2]: 5.2,6.1

# without this export it was not working
export TF_NEED_CUDA="1"
bazel build -c opt \
            --config=cuda \
            //tensorflow/tools/pip_package:build_pip_package

# INFO: Elapsed time: 16706.700s, Critical Path: 190.08s, Remote (0.00% of the time): # [queue: 0.00%, setup: 0.00%, process: 0.00%]
# INFO: 14075 processes: 14075 local.
# INFO: Build completed successfully, 16607 total actions
# real    278m28.498s (4:38 h)
# Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
mv /tmp/tensorflow_pkg/tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl ~

pip install --no-cache-dir ~/tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl

Test it:

cd
git clone https://github.com/keras-team/keras.git
cd keras/examples
python mnist_cnn.py

Troubleshooting

tensorflow/tensorflow#23613: "--config=opt" -> "-c opt"
tensorflow/tensorflow#4841: export TF_NEED_CUDA="1"
- Building with --config=cuda but TensorFlow is not configured to build with GPU support.
- no such target '@local_config_cuda//crosstool:toolchain': target 'toolchain' not declared in package 'crosstool'
tensorflow/tensorflow#22654: seems that bazel 1.19.1 is broken -> upgrade to 1.19.2
- Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
tensorflow/tensorflow#23719: seems that bazel 1.19.1 is broken -> upgrade to 1.19.2
- ERROR: /home/bza/.cache/bazel/_bazel_bza/a1c658cb770a9cc79bfab3126f33f28b/external/local_config_cc/BUILD:57:1: in cc_toolchain rule @local_config_cc//:cc-compiler-k8: Error while selecting cc_toolchain: Toolchain identifier 'local' was not found, valid identifiers are [local_linux, local_darwin, local_windows]
it was picking old libcudnn6-dev -> remove and install libcudnn7-dev
- Cannot find cuda library libcudnn.so.6
NCCL: 2.2 not working with libnccl2 -> specify 1.3
ERROR: Config value cuda is not defined in any .rc file
- tensorflow/tensorflow#23401
- change "import /home/bza/tensorflow/.tf_configure.bazelrc" to "import /home/bza/tensorflow/tools/bazel.rc"
- sed -i -E 's#\.tf_configure\.bazelrc#tools/bazel.rc#' .bazelrc
tensorflow/tensorflow#24385
- Downgrade to bazel 0.19.1 (was latest 0.22.0) -> 0.19.2 or 0.15.0

Default GCC flag is -march=native, so on my CPU (without AVX) AVX will be disabled. If cross-compiling on another CPU with AVX, we can disable it explicitly:

bazel build --config=opt \
            --config=cuda \
            --copt=-no-mavx \
            --copt=-no-mavx2 \
            --copt=-no-mfma \
            //tensorflow/tools/pip_package:build_pip_package

Another build (with explicitly enabled SSE4 and with -D_GLIBCXX_USE_CXX11_ABI=0):

time bazel build -c opt \
            --config=cuda \
            --copt=-mno-avx \
            --copt=-mno-avx2 \
            --copt=-mno-fma \
            --copt=-msse4.1 \
            --copt=-msse4.2 \
            --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
            //tensorflow/tools/pip_package:build_pip_package

Another build was without CUDA:

unset TF_TF_NEED_CUDA
bazel clean
./configure
# ... build as above

Compute capabilities:

5.2 - Maxwell 9xx series (including GeForce GTX 980 Ti)
6.1 - Pascal 10xx series

TODO:

Docker build
enable SSE4 instructions (likely not enabled with -march=native)
TensorRT
with GCC 5 possibly use:
- --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
- but it is useful only for custom TF operations
- maybe can be useful to overcome different versions of glibc tensorflow/tensorflow#15376

fightthepower · Answer 1 · Thu Mar 21 2019 14:48:12 GMT+0800 (China Standard Time)

Dude many thanks. I have searched all the web for this wheel for my old computer. Thanks for taking your time and building one.

Nathan Raw · Answer 2 · Wed Apr 17 2019 08:09:02 GMT+0800 (China Standard Time)

THANK YOU

Been trying for a year to get a newer version than tensorflow v1.5 working. This is the first thing that has worked. 🎉

Bohumír Zámečník · Answer 3 · Wed Apr 17 2019 12:18:19 GMT+0800 (China Standard Time)

Should I build also 1.13?

Nathan Raw · Answer 4 · Wed Apr 17 2019 12:31:41 GMT+0800 (China Standard Time)

@bzamecnik As of right now, I'm too afraid to update CUDA to work with 1.13 (like I said, it took me way to long to get this working), so I likely will not be able to test for you. I'm sure others in the community would love that contribution, though.

atoultaro · Answer 5 · Thu May 16 2019 02:33:40 GMT+0800 (China Standard Time)

I really really appreciate this build. Now my old Xeon desktop can finally use tensorflow with version > 1.5. What a leap to 1.12! Thanks!!

fightthepower · Answer 6 · Wed Oct 02 2019 13:32:02 GMT+0800 (China Standard Time)

@bzamecnik Hey tensorflow 2 has released can you please make a wheel for this version cpu