TextDetectionWithScriptID

Fully Convolutional Neural Network for Text Detection in Scene/Document with Script ID supports.

1. Overview

This repo contains two types of deep neural networks,

pixel-level Text Detection Classification Network (TDCN) with specializations in
- Word-Level SceneText, e.g. a street-view image.
- Line-Level DocumentText, e.g. a scanned letter.
It classifies each pixel in an input image into one for the following three categories:

class index class name color channel Description

0 NonText Red Any non-text content

1 Border Green Pixels on text borders

2 Text Blue Text Pixels

class index	class name	color channel	Description
0	NonText	Red	Any non-text content
1	Border	Green	Pixels on text borders
2	Text	Blue	Text Pixels

page-level Script ID Classification Network (SICN)

It classifies an input image into one of the following categories:

scriptID index	scriptID name	Country
0	NonText	N/A
1	Latin	US, UK, etc.
2	Hebrew	Israel
3	Cyrillic	Russia, Ukraine, etc.
4	Arabic	Iran, Saudi Arabia, etc.
5	Chinese	China, HongKong, etc.
6	TextButUnknown	N/A

where TDCN models are mainly based on the ICCV17 paper Self-organized Text Detection with Minimal Post-processing via Border Learning.

All models are trained with the Keras deep nerual network library with the TensorFlow backend.

2. Structure

This repo contains the following data

data: sample image data for testing.
lib : core model definitions and util functions.
model : pretrained model weights.
- textDetSceneModel.h5: pretrained TDCN weight for scene text detection
- textDetDocumModel.h5: pretrained TDCN weight for document text detection
- scriptIDModel.h5: pretrained SICN weight
notebook : Python2 demo notebook.
bin : Python2/3 command-line tool.

3. Dependency

The python code in this repo is compatible with both Python2.7x and Python3. It depends on the core deep learning libraries

Keras: >=2.0.7
TensorFlow: >=1.1.0

and image processing libraries.

OpenCV-Python: >=3.1.0
Skimage: >= 0.13.0

4. Usage

4.1 Basic Models

# 1. TDCN
textDet_model = textDetCore.create_textDet_model()
textDet_model.load_weights( textDet_weight )
# 2. SICN
scriptID_model = textDetCore.create_scriptID_model()
scriptID_model.load_weights( scriptID_weight )

4.2 Simple Decoder

Simple decoder is written w.r.t. the use case of text detection for a document image or a text input image where text regions can be expressed as a set of rectangular bounding boxes.

def simple_decoder( file_path,
                    textDet_model,
                    scriptID_model=None,
                    output_dir=None,
                    dom_font=None,
                    dark_text=None,
                    rotated_text=False,
                    proba_threshold=.5,
                    lh_threshold=15,
                    contrast_threshold=24.,
                    return_proba=True,
                    n_jobs=1,
                    verbose=2) :
    """
    INPUTS:
        ------------------------------------------------------------------------
        | Mandentory Parameters
        ------------------------------------------------------------------------
        file_path = str, path to a local file or URL to a web image
        textDet_model = keras model, pretrained text detection model
        scriptID_model = None or keras model, pretrained script ID model
                         if None, then no scriptID classification
        output_dir = None or str or 'SKIP', dir to save detected and corrected text regions,
                     if None, then text regions as JPEG buffers
                     if SKIP, then not save text regions
        ------------------------------------------------------------------------
        | Prior Knowledge Parameters (better performance if they are provided)
        ------------------------------------------------------------------------
        dom_font = int or None, dominant fontsize height in terms of pixels
                   if None, then apply automatic estimation
        dark_text = bool or None, whether texts on image is dark/black or not
                    if None, then apply automatic estimation
        rotated_text = bool, whether text regions are rotated or not
                       if False, then faster decoding is applied
        ------------------------------------------------------------------------
        | Simple Rules to Reject A Text Region
        ------------------------------------------------------------------------
        proba_threshold = float in (0,1), the minimum text probability to accept a text region
        lh_threshold = float, the minimum line height to accept a text region
        contrast_threshold = float in (0,255), the minimum intensity standard deviation to accept a text region
        ------------------------------------------------------------------------
        | Others
        ------------------------------------------------------------------------
        return_proba = bool, if true, return the raw outputs of both models
        n_jobs = int, if greater than 1, use multiple CPUs
        verbose = bool, if true, print out state messages

    OUTPUTS:
            output_lut = dict, containing all decoded results including
                         'filename' -> input image file
                         'resize'   -> resize factor for text detection analysis
                         'md5'      -> image md5 tag
                         'Pr(XXX)'  -> script ID probability of a known scriptID class XXX
                         'bboxes'    -> list of bounding box dictionaries, where each element is a dict of
                               'cntx'     -> bbox's x coordinates
                               'cnty'     -> bbox's y coordinates
                               'proba'    -> text probility of this region
                               'area'     -> bbox area
                               'contrast' -> bbox contrast
                               'imgfile'  -> file path the dumped text region image, when output_dir is given
                               'jpgbuf'   -> jpeg buffer for the text region image, when output_dir is None
            proba_map = ( text_proba, script_proba )
                        - text_proba, i.e. a text probability map, size of imgHeight-by-imgWidth-by-3
                        - script proba, i.e. a script ID probability map, size of 1-by-7                                   
    """

4.3 Lazy (but slow) Deocder

This is a more complicated decoder for scene text detection. This extra complexity is due to the fact that the dominant fontsize assumption may no longer hold for a scene text image, and we can't guess the actual fontsize height based on the input image size. We, therefore, analyze a given scene text image at a number of resolution scales to capature text regions of very different fontsizes.

def lazy_decoder( file_path,
                  textDet_model,
                  scriptID_model=None,
                  num_resolutions=5,
                  output_dir=None,
                  proba_threshold=.33,
                  lh_threshold=8,
                  contrast_threshold=32.,
                  return_proba=True,
                  n_jobs=1,
                  verbose=2) :
    """
    INPUTS:
        ------------------------------------------------------------------------
        | Mandentory Parameters
        ------------------------------------------------------------------------
        file_path = str, path to a local file or URL to a web image
        textDet_model = keras model, pretrained text detection model
        scriptID_model = None or keras model, pretrained script ID model
                         if None, then no scriptID classification
        output_dir = None or str or 'SKIP', dir to save detected and corrected text regions,
                     if None, then text regions as JPEG buffers
                     if SKIP, then not save text regions
        ------------------------------------------------------------------------
        | Simple Rules to Reject A Text Region
        ------------------------------------------------------------------------
        proba_threshold = float in (0,1), the minimum text probability to accept a text region
        lh_threshold = float, the minimum line height to accept a text region
        contrast_threshold = float in (0,255), the minimum intensity standard deviation to accept a text region
        ------------------------------------------------------------------------
        | Others
        ------------------------------------------------------------------------
        return_proba = bool, if true, return the raw outputs of both models
        n_jobs = int, if greater than 1, use multiple CPUs
        num_resolutions = int, default 3
        verbose = bool, if true, print out state messages

    OUTPUTS:
        output_lut = dict, containing all decoded results including
                     'filename' -> input image file
                     'resize'   -> resize factor for text detection analysis
                     'md5'      -> image md5 tag
                     'Pr(XXX)'  -> script ID probability of a known scriptID class XXX
                     'bboxes'    -> list of bounding box dictionaries, where each element is a dict of
                           'cntx'     -> bbox's x coordinates
                           'cnty'     -> bbox's y coordinates
                           'proba'    -> text probility of this region
                           'area'     -> bbox area
                           'contrast' -> bbox contrast
                           'imgfile'  -> file path the dumped text region image, when output_dir is given
                           'jpgbuf'   -> jpeg bufferfor the text region image, when output_dir is None
        proba_map = ( text_proba, script_proba )
                    - text_proba, i.e. a text probability map, size of imgHeight-by-imgWidth-by-3
                    - script proba, i.e. a script ID probability map, size of 1-by-7
    """

4.4 Command-Line Tool

Finally, we also provide a command-line tool for text detection.

python bin/textDetection.py -h
usage: textDetection.py [-h] [-i INPUT_FILES] [-o OUTPUT_DIR]
                        [-t {full,textDet,postProc}]
                        [-m {doc0,doc1,scene,custom}] [-v VERBOSE]
                        [-tl TH_LINEHEIGHT] [-tp TH_TEXTPROB]
                        [-tc TH_CONTRAST] [-mt {line,word}]
                        [-dt {simple,lazy}] [-nj N_JOBS] [-nr N_RES]
                        [--domFont DOM_FONT] [--darkText] [--rotText]
                        [--jpegBuffer] [--version]

Text Detection with ScriptID supports

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  -i INPUT_FILES, --inputFile INPUT_FILES
                        input test image files or cached json files
  -o OUTPUT_DIR, --outputDir OUTPUT_DIR
                        output detection dir (./)
  -t {full,textDet,postProc}, --task {full,textDet,postProc}
                        tasks in {full, textDet, postProc}

optional arguments:
  -m {doc0,doc1,scene,custom}, --mode {doc0,doc1,scene,custom}
                        working mode in {doc0(black and horizontal text),
                        doc1(black text), scene, custom}
  -v VERBOSE            verbosity level (0), higher means more print outs
  -tl TH_LINEHEIGHT, --threshLineHeight TH_LINEHEIGHT
                        minimal text region height (15)
  -tp TH_TEXTPROB, --threshTextProba TH_TEXTPROB
                        minimal text region probability (.5)
  -tc TH_CONTRAST, --threshContrast TH_CONTRAST
                        minimal text region contrast (16)
  -mt {line,word}, --modelType {line,word}
                        detector type in {line, word}
  -dt {simple,lazy}, --decoderType {simple,lazy}
                        decoder type in {simple, lazy}
  -nj N_JOBS, --nJobs N_JOBS
                        number of parallel cpu jobs
  -nr N_RES, --nRes N_RES
                        number of analysis resultions
  --domFont DOM_FONT    dominant text height in pixels
  --darkText            whether or not texts are darker
  --rotText             whether or not texts are rotated
  --jpegBuffer          whether or not save image inside json
  --version             show program's version number and exit

Below is sample testing code that you may want to try

# make a tmp dir to host data
mkdir tmp && cd tmp
# find some testing data
find ../data/ -name "*_*.jpg" > test.list
# run text detection and save text images using the doc0 mode
python ../bin/textDetection.py  -i test.list -o /tmp/result_detOnly -t full -m doc0 -v 2

4.4 Demo Code

You may find ipython2 notebook under notebook. Alternatively, you are welcome to use our provided google colab notebooks (open and clone your own)

Python2 Notebook
- google colab
- github
python3 Notebook
- google colab
- github

5. Citation and Contact

If you use this repo for academic purposes, please cite the following paper.

@INPROCEEDINGS{WuICCV2017, 
author={Y. Wu and P. Natarajan}, 
booktitle={2017 IEEE International Conference on Computer Vision (ICCV)}, 
title={Self-Organized Text Detection with Minimal Post-processing via Border Learning}, 
year={2017}, 
pages={5010-5019}, doi={10.1109/ICCV.2017.535}, 
month={Oct},}

For questions, please contact Dr. Yue Wu (yue_wu@isi.edu).

6. License

The Software is made available for academic or non-commercial purposes only. The license is for a copy of the program for an unlimited term. Individuals requesting a license for commercial use must pay for a commercial license.

  USC Stevens Institute for Innovation 
  University of Southern California 
  1150 S. Olive Street, Suite 2300 
  Los Angeles, CA 90115, USA 
  ATTN: Accounting

DISCLAIMER. USC MAKES NO EXPRESS OR IMPLIED WARRANTIES, EITHER IN FACT OR BY OPERATION OF LAW, BY STATUTE OR OTHERWISE, AND USC SPECIFICALLY AND EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, VALIDITY OF THE SOFTWARE OR ANY OTHER INTELLECTUAL PROPERTY RIGHTS OR NON-INFRINGEMENT OF THE INTELLECTUAL PROPERTY OR OTHER RIGHTS OF ANY THIRD PARTY. SOFTWARE IS MADE AVAILABLE AS-IS. LIMITATION OF LIABILITY. TO THE MAXIMUM EXTENT PERMITTED BY LAW, IN NO EVENT WILL USC BE LIABLE TO ANY USER OF THIS CODE FOR ANY INCIDENTAL, CONSEQUENTIAL, EXEMPLARY OR PUNITIVE DAMAGES OF ANY KIND, LOST GOODWILL, LOST PROFITS, LOST BUSINESS AND/OR ANY INDIRECT ECONOMIC DAMAGES WHATSOEVER, REGARDLESS OF WHETHER SUCH DAMAGES ARISE FROM CLAIMS BASED UPON CONTRACT, NEGLIGENCE, TORT (INCLUDING STRICT LIABILITY OR OTHER LEGAL THEORY), A BREACH OF ANY WARRANTY OR TERM OF THIS AGREEMENT, AND REGARDLESS OF WHETHER USC WAS ADVISED OR HAD REASON TO KNOW OF THE POSSIBILITY OF INCURRING SUCH DAMAGES IN ADVANCE.

For commercial license pricing and annual commercial update and support pricing, please contact:

  Rakesh Pandit USC Stevens Institute for Innovation 
  University of Southern California 
  1150 S. Olive Street, Suite 2300
  Los Angeles, CA 90115, USA 

  Tel: +1 213-821-3552
  Fax: +1 213-821-5001 
  Email: rakeshvp@usc.edu and ccto: accounting@stevens.usc.edu

isi-vista / textDetectionWithScriptID