AnjieCheng / Fast-ImageNet-Dataloader

A fast data loader for ImageNet on PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Install

Requirements:

  • Tensorpack: clone and pip install -e .
  • LMDB: pip install lmdb
  • OpenCV: pip install opencv-python
  • Protobuf: conda install protobuf
  • Prctl: clone, sudo apt-get install build-essential libcap-dev and python setup.py build

Tensorpack version > 0.9 is currently NOT supported. Note that some prebuilt opencv is much slower than others. Remember to check with this script and make sure it prints < 1s.

Preprocessing

To start, set the environment variable IMAGENET to the ILSVRC2012 dataset. TENSORPACK_DATASET should also be set (for tensorpack).

export IMAGENET='/mnt/work/data/raw-data/'
python preprocess_sequential.py

Usage

train_loader = LMDBLoader('train', batch_size=args.batch_size, num_workers=32, shuffle=True, cuda=True)
valid_loader = LMDBLoader('val', batch_size=args.batch_size, num_workers=32, shuffle=False, cuda=True) 

TODO

  • Image Normalization
  • Support HDF5 format
  • Tensorpack version > 0.9

Disclaimer

Code mainly from sequential-imagenet-dataloader, and Tensorpack examples.

Reference

About

A fast data loader for ImageNet on PyTorch.

License:MIT License


Languages

Language:Python 100.0%