luyifanlu / DALI_pytorch_demo

Example code showing how to use Nvidia DALI in pytorch, with fallback to torchvision. Contains a few differences to the official Nvidia example, namely a completely CPU pipeline & improved memory usage

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DALI_pytorch_demo

Example code showing how to use Nvidia DALI in pytorch, with fallback to torchvision. Contains a few differences to the official Nvidia example:

  • Reimport DALI & recreate dataloaders at end of every epoch to reduce long term memory usage
  • Move CPU DALI pipeline completely to CPU, freeing up GPU resources
  • Keep DALI validation pipeline off GPU during training, reducing GPU memory usage

Compared to the official example, these mods allow for a ~50% increase in max batch size (tested using ResNet18 on a GCloud V100 instance with 10 workers:

Dataloader Type Max Batch Size
DALI GPU reference 640
DALI GPU 928 / 45% increase
DALI CPU reference 800
DALI CPU 1216 / 52% increase
Torchvision w/ PIL-SIMD 1248

Here are some benchmarks on a Google Cloud V100 instance with 12 vCPUs (6 physical cores), 78GB RAM, Apex FP16 training with Shufflenet V2 0.5 & batch size 512:

Dataloader Type Speed (images/s)
DALI GPU 3910
DALI CPU 1816
Torchvision w/ PIL-SIMD 1058

You can read the correspondig blog post here

About

Example code showing how to use Nvidia DALI in pytorch, with fallback to torchvision. Contains a few differences to the official Nvidia example, namely a completely CPU pipeline & improved memory usage

License:Apache License 2.0


Languages

Language:Python 100.0%