dividiti / ck-caffe-nvidia-tx1

CK-Caffe public benchmarking data on NVIDIA TX1

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cknowledge.org/ai: Crowdsourcing benchmarking and optimisation of AI

[PUBLIC] Benchmarking Caffe and TensorRT on NVIDIA Jetson TX1

NB: The Caffe experimental results are released with approval from General Motors. The TensorRT 1.0 EA experimental results are released with approval from NVIDIA.

The Jupyter notebook (view on github.com; view on nbviewer.jupyter.org) in this Collective Knowledge repository analyses the performance (execution time, memory consumption):

$ uname -a
Linux tegra-ubuntu 3.10.96-tegra #1 SMP PREEMPT Wed Nov 9 19:42:57 PST 2016 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"
  • using 6 Caffe libraries:

    • [tag] Branch (revision hash, date): math libraries.
    • [cpu] Master (24d2f67, 28/Nov/2016): with OpenBLAS 0.2.19;
    • [nvidia-cuda] NVIDIA 0.15 (1024d34, 17/Nov/2016): with cuBLAS (part of CUDA Toolkit 8.0.33);
    • [nvidia-cudnn] NVIDIA 0.15 (1024d34, 17/Nov/2016): with cuDNN 5.1;
    • [nvidia-fp16-cuda] NVIDIA experimental/fp16 (fca1cf4, 11/Jul/2016): with cuBLAS (part of CUDA Toolkit 8.0.33);
    • [nvidia-fp16-cudnn] NVIDIA experimental/fp16 (fca1cf4, 11/Jul/2016): with cuDNN 5.1;
    • [libdnn-cuda] OpenCL (b735c2d, 23/Nov/2016): with libDNN and cuBLAS (part of CUDA Toolkit 8.0.33) for fully connected layers;

    NB: libDNN is not yet tuned for TX1 - it uses parameters that are optimal for GTX 1080.

  • using 2 configurations of the NVIDIA TensorRT 1.0.0 EA engine:

    • [tensorrt-fp16] NVIDIA TensorRT 1.0.0 EA with fp16 enabled;
    • [tensorrt-fp32] NVIDIA TensorRT 1.0.0 EA with fp16 disabled;

    NB: This EA ("early access") version is used in accordance with its special licensing terms: the results are released with explicit written approval from NVIDIA. The results may not be representative of the GA ("general availability") version.

  • using 4 CNN models:

  • with the batch size varying from 2 to 16 with step 2.

About

CK-Caffe public benchmarking data on NVIDIA TX1


Languages

Language:Jupyter Notebook 100.0%