- replace the wandb api key by yours
- define the GPU setup you have
- set the benchmark you want to explore
- run the shell
We highly suggest to setup and pipenv isolated environment
$ pip install --user pipenv
then
$ git clone git@github.com:theunifai/DeepLearningExamples.git
$ cd DeepLearningExamples
$ pipenv shell
$ pipenv install -r requirements.txt
you can either set it in the benchmark.yml file or use the shell
$ wandb login
if the API Key for wandb is not set in the benchmark.yml file the system will look into your environment to fetch your api key
in the Yaml file set the topology using you GPU configuration:
$ nvidia-smi
as presented above in the example with nvidia-smi here is the corresponding configuration in the yaml file.
you can activate the capabilities to explore for each GPU (for instance V100s doesnt support AMP so it should be set to false).
In the above example we can see that the benchmarks to explore are based on template already structure by UnifAI's team. all you have to set is (if needed) overwrite the hyperparameters you want to explore.Everything param value should be an array following this standard:
benchmarks
benchmark-name
benchmark-template: <template on which you want to base your benchmark on>
active: <boolean status of the benchmark to explore : false means skip the benchmark>
params:
param1: [<custom value1=a>, <custom value2=b>] <- this must be an array
param2: [<custom value1=c>, <custom value2=d>] <- this must be an array
the system will do the cartesian exploration of the benchmark meaning in our example exploring 4 parameters combination:
- a.c
- a.d
- b.c
- b.d
You are now ready to run the benchmarks you have many options that can be set
# ./benchmark.py --help
# ./benchmark.py --run
This command will build and run the benchmarks for AMP (Automatic Mixed Precision), FP32 and TF32.
Framework | Domain | Task | Model | Status |
---|---|---|---|---|
PyTorch | Image | Classification | efficientnet | Ok |
PyTorch | Image | Classification | resnet50v1.5 | Ok |
PyTorch | Image | Classification | resnext101-32x4d | Ok |
PyTorch | Image | Classification | se-resnext101-32x4d | Ok |
PyTorch | Image | Detection | Efficientdet | Ok |
PyTorch | Image | Detection | SSD | Ok |
PyTorch | DrugDiscovery | SE3Transformer | SE3Transformer | |
PyTorch | Forecasting | TFT | TFT | |
PyTorch | LanguageModeling | BART | BART | Ok |
PyTorch | LanguageModeling | BERT | BERT | |
PyTorch | LanguageModeling | Transformer-XL | Transformer-XL | |
PyTorch | Recommendation | DLRM | DLRM | |
PyTorch | Recommendation | NCF | NCF | |
PyTorch | Segmentation | MaskRCNN | MaskRCNN | |
PyTorch | Segmentation | nnUNet | nnUNet | |
PyTorch | SpeechRecognition | Jasper | Jasper | |
PyTorch | SpeechRecognition | QuartzNet | QuartzNet | |
PyTorch | SpeechSynthesis | FastPitch | FastPitch | |
PyTorch | SpeechSynthesis | Tacotron2 | Tacotron2 | |
PyTorch | Translation | GNMT | GNMT | |
PyTorch | Translation | Transformer | Transformer | |
TensorFlow | Image | Classification | resnet50v1.5 | |
TensorFlow | Image | Classification | resnext101-32x4d | |
TensorFlow | Image | Classification | se-resnext101-32x4d | |
TensorFlow | Image | Detection | SSD | |
TensorFlow | LanguageModeling | BERT | BERT | |
TensorFlow | LanguageModeling | Transformer-XL | Transformer-XL | |
TensorFlow | Recommendation | VAE-CF | VAE-CF | |
TensorFlow | Recommendation | NCF | NCF | |
TensorFlow | Recommendation | WideAndDeep | WideAndDeep | |
TensorFlow | Segmentation | MaskRCNN | MaskRCNN | |
TensorFlow | Segmentation | UNet_3D_Medical | UNet_3D_Medical | |
TensorFlow | Segmentation | UNet_Industrial | UNet_Industrial | |
TensorFlow | Segmentation | UNet_Medical | UNet_Medical | |
TensorFlow | Segmentation | Vnet | Vnet | |
TensorFlow | Translation | GNMT | GNMT | |
TensorFlow2 | Image | Classification | efficientnet | |
TensorFlow2 | LanguageModeling | BERT | BERT | |
TensorFlow2 | LanguageModeling | ELECTRA | ELECTRA | |
TensorFlow2 | Recommendation | DLRM | DLRM | |
TensorFlow2 | Recommendation | WideAndDeep | WideAndDeep | |
TensorFlow2 | Segmentation | MaskRCNN | MaskRCNN | |
TensorFlow2 | Segmentation | UNet_Medical | UNet_Medical | |
DGLPyTorch | DrugDiscovery | SE3Transformer | SE3Transformer | |
MxNet | Image | Classification | resnet50v1.5 |
This repository provides State-of-the-Art Deep Learning examples that are easy to train and deploy, achieving the best reproducible accuracy and performance with NVIDIA CUDA-X software stack running on NVIDIA Volta, Turing and Ampere GPUs.
These examples, along with our NVIDIA deep learning software stack, are provided in a monthly updated Docker container on the NGC container registry (https://ngc.nvidia.com). These containers include:
- The latest NVIDIA examples from this repository
- The latest NVIDIA contributions shared upstream to the respective framework
- The latest NVIDIA Deep Learning software libraries, such as cuDNN, NCCL, cuBLAS, etc. which have all been through a rigorous monthly quality assurance process to ensure that they provide the best possible performance
- Monthly release notes for each of the NVIDIA optimized containers
Models | Framework | A100 | AMP | Multi-GPU | Multi-Node | TRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|---|
ResNet-50 | PyTorch | Yes | Yes | Yes | - | Yes | - | Yes | Yes | - |
ResNeXt-101 | PyTorch | Yes | Yes | Yes | - | Yes | - | Yes | Yes | - |
SE-ResNeXt-101 | PyTorch | Yes | Yes | Yes | - | Yes | - | Yes | Yes | - |
EfficientNet-B0 | PyTorch | Yes | Yes | Yes | - | - | - | - | Yes | - |
EfficientNet-B4 | PyTorch | Yes | Yes | Yes | - | - | - | - | Yes | - |
EfficientNet-WideSE-B0 | PyTorch | Yes | Yes | Yes | - | - | - | - | Yes | - |
EfficientNet-WideSE-B4 | PyTorch | Yes | Yes | Yes | - | - | - | - | Yes | - |
Mask R-CNN | PyTorch | Yes | Yes | Yes | - | - | - | - | - | Yes |
nnUNet | PyTorch | Yes | Yes | Yes | - | - | - | - | Yes | - |
SSD | PyTorch | Yes | Yes | Yes | - | - | - | - | - | Yes |
ResNet-50 | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | - |
ResNeXt101 | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | - |
SE-ResNeXt-101 | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | - |
Mask R-CNN | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | - |
SSD | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | Yes |
U-Net Ind | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | Yes |
U-Net Med | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | - |
U-Net 3D | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | - |
V-Net Med | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | - |
U-Net Med | TensorFlow2 | Yes | Yes | Yes | - | - | - | - | Yes | - |
Mask R-CNN | TensorFlow2 | Yes | Yes | Yes | - | - | - | - | Yes | - |
EfficientNet | TensorFlow2 | Yes | Yes | Yes | Yes | - | - | - | Yes | - |
ResNet-50 | MXNet | - | Yes | Yes | - | - | - | - | - | - |
Models | Framework | A100 | AMP | Multi-GPU | Multi-Node | TRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|---|
BERT | PyTorch | Yes | Yes | Yes | Yes | - | - | Yes | Yes | - |
TransformerXL | PyTorch | Yes | Yes | Yes | Yes | - | - | - | Yes | - |
GNMT | PyTorch | Yes | Yes | Yes | - | - | - | - | - | - |
Transformer | PyTorch | Yes | Yes | Yes | - | - | - | - | - | - |
ELECTRA | TensorFlow2 | Yes | Yes | Yes | Yes | - | - | - | Yes | - |
BERT | TensorFlow | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | Yes |
BERT | TensorFlow2 | Yes | Yes | Yes | Yes | - | - | - | Yes | - |
BioBert | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | Yes |
TransformerXL | TensorFlow | Yes | Yes | Yes | - | - | - | - | - | - |
GNMT | TensorFlow | Yes | Yes | Yes | - | - | - | - | - | - |
Faster Transformer | Tensorflow | - | - | - | - | Yes | - | - | - | - |
Models | Framework | A100 | AMP | Multi-GPU | Multi-Node | TRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|---|
DLRM | PyTorch | Yes | Yes | Yes | - | - | Yes | Yes | Yes | Yes |
DLRM | TensorFlow2 | Yes | Yes | Yes | Yes | - | - | - | Yes | - |
NCF | PyTorch | Yes | Yes | Yes | - | - | - | - | - | - |
Wide&Deep | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | - |
Wide&Deep | TensorFlow2 | Yes | Yes | Yes | - | - | - | - | Yes | - |
NCF | TensorFlow | Yes | Yes | Yes | - | - | - | - | Yes | - |
VAE-CF | TensorFlow | Yes | Yes | Yes | - | - | - | - | - | - |
Models | Framework | A100 | AMP | Multi-GPU | Multi-Node | TRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|---|
Jasper | PyTorch | Yes | Yes | Yes | - | Yes | Yes | Yes | Yes | Yes |
Hidden Markov Model | Kaldi | - | - | Yes | - | - | - | Yes | - | - |
Models | Framework | A100 | AMP | Multi-GPU | Multi-Node | TRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|---|
FastPitch | PyTorch | Yes | Yes | Yes | - | - | - | - | Yes | - |
FastSpeech | PyTorch | - | Yes | Yes | - | Yes | - | - | - | - |
Tacotron 2 and WaveGlow | PyTorch | Yes | Yes | Yes | - | Yes | Yes | Yes | Yes | - |
Models | Framework | A100 | AMP | Multi-GPU | Multi-Node | TRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|---|
SE(3)-Transformer | PyTorch | Yes | Yes | Yes | - | - | - | - | - | - |
In each of the network READMEs, we indicate the level of support that will be provided. The range is from ongoing updates and improvements to a point-in-time release for thought leadership.
Multinode Training
Supported on a pyxis/enroot Slurm cluster.
Deep Learning Compiler (DLC)
TensorFlow XLA and PyTorch JIT and/or TorchScript
Accelerated Linear Algebra (XLA)
XLA is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes. The results are improvements in speed and memory usage.
PyTorch JIT and/or TorchScript
TorchScript is a way to create serializable and optimizable models from PyTorch code. TorchScript, an intermediate representation of a PyTorch model (subclass of nn.Module) that can then be run in a high-performance environment such as C++.
Automatic Mixed Precision (AMP)
Automatic Mixed Precision (AMP) enables mixed precision training on Volta, Turing, and NVIDIA Ampere GPU architectures automatically.
TensorFloat-32 (TF32)
TensorFloat-32 (TF32) is the new math mode in NVIDIA A100 GPUs for handling the matrix math also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs. TF32 is supported in the NVIDIA Ampere GPU architecture and is enabled by default.
Jupyter Notebooks (NB)
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
We're posting these examples on GitHub to better support the community, facilitate feedback, as well as collect and implement contributions using GitHub Issues and pull requests. We welcome all contributions!
In each of the network READMEs, we indicate any known issues and encourage the community to provide feedback.