This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC loss for image-based sequence recognition tasks, such as scene text recognition and OCR. For details, please refer to our paper http://arxiv.org/abs/1507.05717.
The software has only been tested on Ubuntu 14.04 (x64). CUDA-enabled GPUs are required. To build the project, first install Torch7, TH++ and LMDB. Please follow their installation instructions. On Ubuntu, lmdb can be installed by apt-get install liblmdb-dev
.
To build the project, go to src/
and execute sh build_cpp.sh
to build the C++ code. If successful, a file named libcrnn.so
should be produced in the src/
directory.
A demo program can be found in src/demo.lua
. Before running the demo, download a pretrained model from here. Put the downloaded model file crnn_demo_model.t7
into directory model/crnn_demo/
. Then launch the demo by:
th demo.lua
The demo reads an example image and recognizes its text content.
Expected output:
Loading model...
Model loaded from ../model/crnn_demo/model.t7
Recognized text: available (raw: a-----v--a-i-l-a-bb-l-e---)
The pretrained model can be used for lexicon-free and lexicon-based recognition tasks. Refer to the functions recognizeImageLexiconFree
and recognizeImageWithLexicion
in file utilities.lua
for details.
Follow the following steps to train a new model on your own dataset.
- Create a new LMDB dataset. A python program is provided in
tool/create_dataset.py
. Refer to the functioncreateDataset
for details. - Create model directory under
model/
. For example,model/foo_model
. Then create configuraton fileconfig.lua
under the model directory. You can copymodel/crnn_demo/config.lua
and do modifications. - Go to
src/
and executeth main_train.lua ../models/foo_model/
. Model snapshots and logging file will be saved into the model directory.
Please cite the following paper if you are using the code/model in your research paper.
@article{ShiBY15,
author = {Baoguang Shi and
Xiang Bai and
Cong Yao},
title = {An End-to-End Trainable Neural Network for Image-based Sequence Recognition
and Its Application to Scene Text Recognition},
journal = {CoRR},
volume = {abs/1507.05717},
year = {2015}
}
The authors would like to thank the developers of Torch7, TH++, lmdb-lua-ffi and char-rnn.
Please let me know if you encounter any issues.