cknowledge.org/ai: Crowdsourcing benchmarking and optimisation of AI

[PUBLIC] Benchmarking Caffe on NVIDIA GTX 1080

NB: The Caffe experimental results are released with approval from General Motors.

The Jupyter notebook (view on github.com; view on nbviewer.jupyter.org) in this Collective Knowledge repository analyses the performance (execution time, memory consumption):

on dividiti's velociti Hewlett-Packard Z640 Workstation (G1X62EA):
- Intel(R) Xeon(R) CPU E5-2650 v3:
  - 10 cores, 20 threads;
  - Base clock 2300 MHz, turbo clock 3000 MHz;
  - Max power consumption 105 Watt;
  - Max memory bandwidth 68 GB/s;
  - RAM memory 32 GB DDR4;

$ uname -a
Linux velociti 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"

NVIDIA GeForce GTX 1080 "Founders Edition":
- Pascal architecture;
- 2560 CUDA cores;
- Base clock 1607 MHz, boost clock 1733 MHz;
- Max power consumption 180 Watt;
- RAM memory 8 GB GDDR5X;
- Max memory bandwidth 320 GB/s;
- GPU Driver 367.57 [10/Oct/2016];
- CUDA Toolkit 8.0.44 [xx/Sep/2016].
using 14 Caffe libraries:
- [tag] Branch (revision hash, date): math libraries.
- [cpu] Master (4ba654f, 5/Oct/2016): with OpenBLAS 0.2.19;
- [cuda] Master (4ba654f, 5/Oct/2016): with cuBLAS (part of CUDA Toolkit 8.0.44);
- [cudnn] Master (4ba654f, 5/Oct/2016): with cuDNN 5.1;
- [nvidia-cuda] NVIDIA v0.15 (1024d34, 17/Nov/2016): with cuBLAS (part of CUDA Toolkit 8.0.44);
- [nvidia-cudnn] NVIDIA v0.15 (1024d34, 17/Nov/2016): with cuDNN 5.1;
- [nvidia-fp16-cuda] NVIDIA experimental/fp16 (fca1cf4, 11/Jul/2016): with cuBLAS (part of CUDA Toolkit 8.0.44);
- [nvidia-fp16-cudnn] NVIDIA experimental/fp16 (fca1cf4, 11/Jul/2016): with cuDNN 5.1;
- [clblas] OpenCL (9abafdc, 7/Oct/2016): with ViennaCL 1.7.1 and clBLAS 2.10;
- [clblast] OpenCL (9abafdc, 7/Oct/2016): with ViennaCL 1.7.1 and CLBlast 0.9.0;
- [viennacl] OpenCL (9abafdc, 7/Oct/2016): with ViennaCL 1.7.1 only;
- [libdnn-cuda] OpenCL (cfaaae1, 25/Oct/2016): with libDNN and cuBLAS;
- [libdnn-clblas] OpenCL (cfaaae1, 25/Oct/2016): with libDNN, ViennaCL 1.7.1 and clBLAS 2.10;
- [libdnn-clblast] OpenCL (cfaaae1, 25/Oct/2016): with libDNN, ViennaCL 1.7.1 and CLBlast 0.9.0;
- [libdnn-viennacl] OpenCL (cfaaae1, 25/Oct/2016): with libDNN and ViennaCL 1.7.1.
using 4 CNN models:
- GoogleNet;
- AlexNet;
- SqueezeNet 1.0;
- SqueezeNet 1.1;
with the batch size varying from 2 to 16 with step 2.

dividiti / ck-caffe-nvidia-gtx1080

cknowledge.org/ai: Crowdsourcing benchmarking and optimisation of AI

[PUBLIC] Benchmarking Caffe on NVIDIA GTX 1080

About

Languages