BENCHMARK ANY NVIDIA GPU CARD

Quickstart

General workflow

replace the wandb api key by yours
define the GPU setup you have
set the benchmark you want to explore
run the shell

Before you start

We highly suggest to setup and pipenv isolated environment

$ pip install --user pipenv

then

$ git clone git@github.com:theunifai/DeepLearningExamples.git

$ cd DeepLearningExamples

$ pipenv shell

$ pipenv install -r requirements.txt

Setup the wandb key

you can either set it in the benchmark.yml file or use the shell

$ wandb login

if the API Key for wandb is not set in the benchmark.yml file the system will look into your environment to fetch your api key

Define the GPU topology to benchmark

in the Yaml file set the topology using you GPU configuration:

$ nvidia-smi

nvidia-smi will help you see the ids of the GPU to analyse.

as presented above in the example with nvidia-smi here is the corresponding configuration in the yaml file.

you can activate the capabilities to explore for each GPU (for instance V100s doesnt support AMP so it should be set to false).

setup of the benchmarks to explore

In the above example we can see that the benchmarks to explore are based on template already structure by UnifAI's team. all you have to set is (if needed) overwrite the hyperparameters you want to explore.

Everything param value should be an array following this standard:

benchmarks
  benchmark-name
    benchmark-template: <template on which you want to base your benchmark on>
    active: <boolean status of the benchmark to explore : false means skip the benchmark>
    params:
      param1: [<custom value1=a>, <custom value2=b>] <- this must be an array
      param2: [<custom value1=c>, <custom value2=d>] <- this must be an array

the system will do the cartesian exploration of the benchmark meaning in our example exploring 4 parameters combination:

a.c
a.d
b.c
b.d

Running the benchmarks

You are now ready to run the benchmarks you have many options that can be set

# ./benchmark.py --help

# ./benchmark.py --run

This command will build and run the benchmarks for AMP (Automatic Mixed Precision), FP32 and TF32.

Work Benchmark Implementation Status

Framework	Domain	Task	Model	Status
PyTorch	Image	Classification	efficientnet	Ok
PyTorch	Image	Classification	resnet50v1.5	Ok
PyTorch	Image	Classification	resnext101-32x4d	Ok
PyTorch	Image	Classification	se-resnext101-32x4d	Ok
PyTorch	Image	Detection	Efficientdet	Ok
PyTorch	Image	Detection	SSD	Ok
PyTorch	DrugDiscovery	SE3Transformer	SE3Transformer
PyTorch	Forecasting	TFT	TFT
PyTorch	LanguageModeling	BART	BART	Ok
PyTorch	LanguageModeling	BERT	BERT
PyTorch	LanguageModeling	Transformer-XL	Transformer-XL
PyTorch	Recommendation	DLRM	DLRM
PyTorch	Recommendation	NCF	NCF
PyTorch	Segmentation	MaskRCNN	MaskRCNN
PyTorch	Segmentation	nnUNet	nnUNet
PyTorch	SpeechRecognition	Jasper	Jasper
PyTorch	SpeechRecognition	QuartzNet	QuartzNet
PyTorch	SpeechSynthesis	FastPitch	FastPitch
PyTorch	SpeechSynthesis	Tacotron2	Tacotron2
PyTorch	Translation	GNMT	GNMT
PyTorch	Translation	Transformer	Transformer
TensorFlow	Image	Classification	resnet50v1.5
TensorFlow	Image	Classification	resnext101-32x4d
TensorFlow	Image	Classification	se-resnext101-32x4d
TensorFlow	Image	Detection	SSD
TensorFlow	LanguageModeling	BERT	BERT
TensorFlow	LanguageModeling	Transformer-XL	Transformer-XL
TensorFlow	Recommendation	VAE-CF	VAE-CF
TensorFlow	Recommendation	NCF	NCF
TensorFlow	Recommendation	WideAndDeep	WideAndDeep
TensorFlow	Segmentation	MaskRCNN	MaskRCNN
TensorFlow	Segmentation	UNet_3D_Medical	UNet_3D_Medical
TensorFlow	Segmentation	UNet_Industrial	UNet_Industrial
TensorFlow	Segmentation	UNet_Medical	UNet_Medical
TensorFlow	Segmentation	Vnet	Vnet
TensorFlow	Translation	GNMT	GNMT
TensorFlow2	Image	Classification	efficientnet
TensorFlow2	LanguageModeling	BERT	BERT
TensorFlow2	LanguageModeling	ELECTRA	ELECTRA
TensorFlow2	Recommendation	DLRM	DLRM
TensorFlow2	Recommendation	WideAndDeep	WideAndDeep
TensorFlow2	Segmentation	MaskRCNN	MaskRCNN
TensorFlow2	Segmentation	UNet_Medical	UNet_Medical
DGLPyTorch	DrugDiscovery	SE3Transformer	SE3Transformer
MxNet	Image	Classification	resnet50v1.5

ORIGINALLY : NVIDIA Deep Learning Examples for Tensor Cores

Introduction

This repository provides State-of-the-Art Deep Learning examples that are easy to train and deploy, achieving the best reproducible accuracy and performance with NVIDIA CUDA-X software stack running on NVIDIA Volta, Turing and Ampere GPUs.

NVIDIA GPU Cloud (NGC) Container Registry

These examples, along with our NVIDIA deep learning software stack, are provided in a monthly updated Docker container on the NGC container registry (https://ngc.nvidia.com). These containers include:

The latest NVIDIA examples from this repository
The latest NVIDIA contributions shared upstream to the respective framework
The latest NVIDIA Deep Learning software libraries, such as cuDNN, NCCL, cuBLAS, etc. which have all been through a rigorous monthly quality assurance process to ensure that they provide the best possible performance
Monthly release notes for each of the NVIDIA optimized containers

Computer Vision

Models	Framework	A100	AMP	Multi-GPU	Multi-Node	TRT	ONNX	Triton	DLC	NB
ResNet-50	PyTorch	Yes	Yes	Yes	-	Yes	-	Yes	Yes	-
ResNeXt-101	PyTorch	Yes	Yes	Yes	-	Yes	-	Yes	Yes	-
SE-ResNeXt-101	PyTorch	Yes	Yes	Yes	-	Yes	-	Yes	Yes	-
EfficientNet-B0	PyTorch	Yes	Yes	Yes	-	-	-	-	Yes	-
EfficientNet-B4	PyTorch	Yes	Yes	Yes	-	-	-	-	Yes	-
EfficientNet-WideSE-B0	PyTorch	Yes	Yes	Yes	-	-	-	-	Yes	-
EfficientNet-WideSE-B4	PyTorch	Yes	Yes	Yes	-	-	-	-	Yes	-
Mask R-CNN	PyTorch	Yes	Yes	Yes	-	-	-	-	-	Yes
nnUNet	PyTorch	Yes	Yes	Yes	-	-	-	-	Yes	-
SSD	PyTorch	Yes	Yes	Yes	-	-	-	-	-	Yes
ResNet-50	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	-
ResNeXt101	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	-
SE-ResNeXt-101	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	-
Mask R-CNN	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	-
SSD	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	Yes
U-Net Ind	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	Yes
U-Net Med	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	-
U-Net 3D	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	-
V-Net Med	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	-
U-Net Med	TensorFlow2	Yes	Yes	Yes	-	-	-	-	Yes	-
Mask R-CNN	TensorFlow2	Yes	Yes	Yes	-	-	-	-	Yes	-
EfficientNet	TensorFlow2	Yes	Yes	Yes	Yes	-	-	-	Yes	-
ResNet-50	MXNet	-	Yes	Yes	-	-	-	-	-	-

Natural Language Processing

Models	Framework	A100	AMP	Multi-GPU	Multi-Node	TRT	ONNX	Triton	DLC	NB
BERT	PyTorch	Yes	Yes	Yes	Yes	-	-	Yes	Yes	-
TransformerXL	PyTorch	Yes	Yes	Yes	Yes	-	-	-	Yes	-
GNMT	PyTorch	Yes	Yes	Yes	-	-	-	-	-	-
Transformer	PyTorch	Yes	Yes	Yes	-	-	-	-	-	-
ELECTRA	TensorFlow2	Yes	Yes	Yes	Yes	-	-	-	Yes	-
BERT	TensorFlow	Yes	Yes	Yes	Yes	Yes	-	Yes	Yes	Yes
BERT	TensorFlow2	Yes	Yes	Yes	Yes	-	-	-	Yes	-
BioBert	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	Yes
TransformerXL	TensorFlow	Yes	Yes	Yes	-	-	-	-	-	-
GNMT	TensorFlow	Yes	Yes	Yes	-	-	-	-	-	-
Faster Transformer	Tensorflow	-	-	-	-	Yes	-	-	-	-

Recommender Systems

Models	Framework	A100	AMP	Multi-GPU	Multi-Node	TRT	ONNX	Triton	DLC	NB
DLRM	PyTorch	Yes	Yes	Yes	-	-	Yes	Yes	Yes	Yes
DLRM	TensorFlow2	Yes	Yes	Yes	Yes	-	-	-	Yes	-
NCF	PyTorch	Yes	Yes	Yes	-	-	-	-	-	-
Wide&Deep	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	-
Wide&Deep	TensorFlow2	Yes	Yes	Yes	-	-	-	-	Yes	-
NCF	TensorFlow	Yes	Yes	Yes	-	-	-	-	Yes	-
VAE-CF	TensorFlow	Yes	Yes	Yes	-	-	-	-	-	-

Speech to Text

Models	Framework	A100	AMP	Multi-GPU	Multi-Node	TRT	ONNX	Triton	DLC	NB
Jasper	PyTorch	Yes	Yes	Yes	-	Yes	Yes	Yes	Yes	Yes
Hidden Markov Model	Kaldi	-	-	Yes	-	-	-	Yes	-	-

Text to Speech

Models	Framework	A100	AMP	Multi-GPU	Multi-Node	TRT	ONNX	Triton	DLC	NB
FastPitch	PyTorch	Yes	Yes	Yes	-	-	-	-	Yes	-
FastSpeech	PyTorch	-	Yes	Yes	-	Yes	-	-	-	-
Tacotron 2 and WaveGlow	PyTorch	Yes	Yes	Yes	-	Yes	Yes	Yes	Yes	-

Graph Neural Networks

Models	Framework	A100	AMP	Multi-GPU	Multi-Node	TRT	ONNX	Triton	DLC	NB
SE(3)-Transformer	PyTorch	Yes	Yes	Yes	-	-	-	-	-	-

NVIDIA support

In each of the network READMEs, we indicate the level of support that will be provided. The range is from ongoing updates and improvements to a point-in-time release for thought leadership.

Glossary

Multinode Training
Supported on a pyxis/enroot Slurm cluster.

Deep Learning Compiler (DLC)
TensorFlow XLA and PyTorch JIT and/or TorchScript

Accelerated Linear Algebra (XLA)
XLA is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes. The results are improvements in speed and memory usage.

PyTorch JIT and/or TorchScript
TorchScript is a way to create serializable and optimizable models from PyTorch code. TorchScript, an intermediate representation of a PyTorch model (subclass of nn.Module) that can then be run in a high-performance environment such as C++.

Automatic Mixed Precision (AMP)
Automatic Mixed Precision (AMP) enables mixed precision training on Volta, Turing, and NVIDIA Ampere GPU architectures automatically.

TensorFloat-32 (TF32)
TensorFloat-32 (TF32) is the new math mode in NVIDIA A100 GPUs for handling the matrix math also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs. TF32 is supported in the NVIDIA Ampere GPU architecture and is enabled by default.

Jupyter Notebooks (NB)
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

Feedback / Contributions

We're posting these examples on GitHub to better support the community, facilitate feedback, as well as collect and implement contributions using GitHub Issues and pull requests. We welcome all contributions!

Known issues

In each of the network READMEs, we indicate any known issues and encourage the community to provide feedback.

About

Deep Learning Examples

Languages

Language:Jupyter Notebook 45.9%Language:Python 45.6%Language:C++ 3.6%Language:Shell 3.1%Language:Cuda 1.2%Language:Makefile 0.3%Language:Dockerfile 0.2%Language:CMake 0.1%Language:Starlark 0.1%Language:C 0.0%