developer0hye / Num-Workers-Search

num_workers Search Algorithm for Fast PyTorch DataLoader

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Num-Workers-Search

num_workers search algorithm for fast PyTorch DataLoader

Background

To find optimal num_workers for PyTorch DataLoader is the key towards fast training.

I measured the total time for loading all the training data of CIFAR10 Dataset with various num_workers size on my PC and Google's Colab.

on my PC

image

Spec

  • CPU: i5-10400
  • GPU: RTX 3070
  • RAM:32GB
  • OS: Windows 10
  • SSD: 1TB

on Colab

image

As you see, to find optimal num_workers is very important for fast training. But it is hard to pick optimal num_workers by some formular because it varies with pc spec, dataset size, and batch size.

Solution

Trial and Error

It can spend a long time to search optimal num_workers but it will save the entire training time.

I use some tricks to reduce the execution time of this algorithm. If you are interested in this algorithm, refer to nws.py.

In-script workflow

import torch
import nws

batch_size = ...
dataset = ...

num_workers = nws.search(dataset=dataset,
                                 batch_size=batch_size,
                                 ...)

loader = torch.utils.data.DataLoader(dataset=dataset,
                                     batch_size=batch_size, 
                                     ...,
                                     num_workers=num_workers, 
                                     ...)

About

num_workers Search Algorithm for Fast PyTorch DataLoader

License:MIT License


Languages

Language:Python 100.0%