DataLoader

CPU based dataloader using PIL and CV2.

DaliDataloader

Custom modules for Dali Dataloader.

DaliDataLoaderMulti

Example to load multiple files with different size. ** Causes memory leak.

DaliDataLoader

Example to load files with batch of 1.

We didn't see much speed loss between Multi and Single data loader.

InferenceIterator

Images are not perfect, and could be incomplete or corrupt which may or maynot get detected by PIL loader. This will break the pipeline of the DaliDataLoader in production, so the work around is initialize and re-initialize mulitple dataloader with small window of images.

Code is self explanatory by reading the comments.

We did benchmark on 9.2k images.

PIL : 40min.
CV2 : 45-47min.
Dali : 6~10min. [Batch 1].
Dali : 6~10min. [Batch 64].
Dali : 6~20min. [Multiple Dali iterators with batch 1 and window of 100image-5image].

Example Code of using InferenceIterator

modify variables or make small changes in the *DatLoader.py

from data import *
import timeit

class opt:
    def __init__(self):
        self.device = 'cuda'
        self.typedataloader = 'cuda' # arg to accomodate which dataloader will be used Multicuda > 1 && cuda == 1 && cpu >=1 
        self.batch_size = 1
        self.sub_size = 100  # Window of 100 images.
        self.num_workers = 4
        self.dataFolder = '/data/sample/'

arg = opt()

def _mainExecute_Detection_(obj,img_,labl,file):
    """Main code"""

dirpath = arg.dataFolder

kwargs = {
    "Dir":dirpath,
    "arg":arg,
    "main" : _mainExecute_Detection_,
    #"model" : model, MachineLearning model if any.

}
datacount = CountDataIterator(**kwargs)

Function to chunk files to small windows.

def chunkIt(seq, batch_size):
    avg = batch_size
    out = []
    last = 0

    while last < len(seq):
        out.append(seq[int(last):int(last + avg)])
        last += avg

    return out

nile649 / DaliDataloader