TF2.0 Datasets Performance

This repo compares different data loading methods for TensorFlow (tf.data, Keras Sequence, pure Python generator) and provides a performance benchmark.

This is still ongoing work. If you spot an error or a possible improvement, open an issue.

Task

The objective is a deblurring task, where we have to pass two random crops of a given input image to the model for training.

Training time is monitored on a RTX 2080 (Cuda 10.0, tensorflow-gpu==2.0.0) for:

num_epochs: 5
steps_per_epoch: 200
batch_size: 4
patch_size: (256, 256)

Dataset used for this is the GOPRO Dataset (download it here).

Results

Results for the different loaders. Explanation on differences between each loader is explained below.

Loaders	Eager Mode Enabled (s)	Eager Mode Disabled (s)
`BasicPythonGenerator`	410	71
`BasicTFDataLoader`	184	47
`TFDataNumParallelCalls`	110	46
`TFDataPrefetch`	106	46
`TFDataGroupedMap`	103	46
`TFDataCache`	95	46

Loaders

BasicPythonGenerator:
- Implement a simple yield
- Operations to load images are using tf.io and tf.image
BasicTFDataLoader:
- Use the tf.data API
- Perform operations with tf.io and tf.image
TFDataNumParallelCalls:
- Add num_parallel_calls=tf.data.experimental.AUTOTUNE to each .map operations
TFDataPrefetch:
- Use prefetching to dataset with dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
TFDataGroupedMap:
- Group all tf.io and tf.image operations to avoid popping too much processes
TFDataCache:
- Cache dataset before selecting random crops with dataset.cache()

Advices

Check for GPU drop with nvidia-smi and TensorBoard profiling
Add num_parallel_calls=tf.data.experimental.AUTOTUNE for optimal parallelization
Group .map() operations to avoid popping too much processes
Cache your dataset at the right time (before data augmentation)
Disable eager mode with tf.compat.v1.disable_eager_execution() when you're sure of your training pipeline

Run it yourself

Create a virtual environment
Download the GOPRO Dataset
python datasets_comparison.py --epochs 5 --steps_per_epoch 200 --batch_size 4 --dataset_path /path/to/gopro/train --n_images 600 --enable_eager True
python run_keras_sequence.py --epochs 5 --steps_per_epoch 200 --batch_size 4 --dataset_path /path/to/gopro/train --n_images 600 --enable_eager True

RaphaelMeudec / tf2-datasets-performance