pkang2017 / Scene-Text-Image-Transformer

Scene Text Image Transformer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scene Text Image Transformer

Build Status

A tool for scene text data augmentation. We provide the tool to avoid overfitting and gain robustness of models.

We are now focusing on the shape of the cropped scene text image. The next version for both detection and recognition tasks will be released later.

Requirements

We recommend Anaconda to manage the version of your dependencies. For example:

     conda install boost=1.67.0

Installation

Build library:

    mkdir build
    cd build
    cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..
    make

Copy the Augment.so to the target folder and follow demo.py to use the tool.

    cp Augment.so ..
    cd ..
    python demo.py

Demo

  • Distortion

  • Stretch

  • Perspective

Speed

To transform an image with size (H:64, W:200), it takes less than 3ms using a 2.0GHz CPU. It is possible to accelerate the process by calling multi-process batch samplers in an on-the-fly manner, such as setting "num_workers" in PyTorch.

Improvement for Recognition

We compare the accuracies of CRNN trained using only the corresponding small training set.

Dataset IIIT5K IC13 IC15
Without Data Augmentation 40.8% 6.8% 8.7%
With Data Augmentation 53.4% 9.6% 24.9%

Citation

@inproceedings{schaefer2006image,
  title={Image deformation using moving least squares},
  author={Schaefer, Scott and McPhail, Travis and Warren, Joe},
  booktitle={ACM transactions on graphics (TOG)},
  volume={25},
  number={3},
  pages={533--540},
  year={2006},
  organization={ACM}
}

Acknowledgment

The tool is the combination of @cxcxcxcx's imgwarp-opencv and @Yati Sagade's opencv-ndarray-conversion. Thanks for your contribution.

Attention

The tool is only free for academic research purposes.

About

Scene Text Image Transformer

License:MIT License


Languages

Language:C++ 96.6%Language:CMake 1.8%Language:Python 1.7%