Scene Text Image Transformer

A tool for scene text data augmentation. We provide the tool to avoid overfitting and gain robustness of models.

We are now focusing on the shape of the cropped scene text image. The next version for both detection and recognition tasks will be released later.

Requirements

GCC 4.8.*
Python 2.7.*
Boost 1.67
OpenCV 2.4.*

We recommend Anaconda to manage the version of your dependencies. For example:

     conda install boost=1.67.0

Installation

Build library:

    mkdir build
    cd build
    cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..
    make

Copy the Augment.so to the target folder and follow demo.py to use the tool.

    cp Augment.so ..
    cd ..
    python demo.py

Demo

Distortion

Stretch

Perspective

Speed

To transform an image with size (H:64, W:200), it takes less than 3ms using a 2.0GHz CPU. It is possible to accelerate the process by calling multi-process batch samplers in an on-the-fly manner, such as setting "num_workers" in PyTorch.

Improvement for Recognition

We compare the accuracies of CRNN trained using only the corresponding small training set.

Dataset	IIIT5K	IC13	IC15
Without Data Augmentation	40.8%	6.8%	8.7%
With Data Augmentation	53.4%	9.6%	24.9%

Citation

@inproceedings{schaefer2006image,
  title={Image deformation using moving least squares},
  author={Schaefer, Scott and McPhail, Travis and Warren, Joe},
  booktitle={ACM transactions on graphics (TOG)},
  volume={25},
  number={3},
  pages={533--540},
  year={2006},
  organization={ACM}
}

Acknowledgment

The tool is the combination of @cxcxcxcx's imgwarp-opencv and @Yati Sagade's opencv-ndarray-conversion. Thanks for your contribution.

Attention

The tool is only free for academic research purposes.

pkang2017 / Scene-Text-Image-Transformer