AigizK / OCR-model

An easy-to-run OCR model pipeline based on CRNN and CTC loss

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OCR model

This is a model for Optical Character Recognition based on CRNN-arhitecture and CTC loss.

OCR-model is a part of ReadingPipeline repo.

Demo

In the demo you can find an example of using of OCR-model (you can run it in your Google Colab).

Quick setup and start

The provided Dockerfile is supplied to build an image with CUDA support and cuDNN.

Preparations

  • Clone the repo.
  • Download and extract dataset to the data/ folder.
  • sudo make all to build a docker image and create a container. Or sudo make all GPUS=device=0 CPUS=10 if you want to specify gpu devices and limit CPU-resources.

If you don't want to use Docker, you can install dependencies via requirements.txt

Configuring the model

You can change the ocr_config.json and set the necessary training and evaluating parameters: alphabet, image size, saving path, etc.

"train": {
    "datasets": [
        {
            "csv_path": "/workdir/data/dataset_1/train.csv",
            "prob": 0.5
        },
        {
            "csv_path": "/workdir/data/dataset_2/train.csv",
            "prob": 0.7
        },
        ...
    ],
    "epoch_size": 10000,
    "batch_size": 512
}
  • epoch_size - the size of an epoch. If you set it to null, then the epoch size will be equal to the amount of samples in the all datasets.
  • It is also possible to specify several datasets for the train/validation/test, setting the probabilities for each dataset separately (the sum of prob can be greater than 1, since normalization occurs inside the processing).

Prepare data

Datasets must be pre-processed and have a single format: each dataset must contain a folder with images (crop images with text) and csv file with annotations. The csv file should contain two columns: "filename" with the relative path to the images (folder-name/image-name.png), and "text"-column with the image transcription.

filename text
images/4099-0.png is

If you use polygon annotations in COCO format, you can prepare a training dataset using this script:

python scripts/prepare_dataset.py \
    --annotation_json_path path/to/the/annotaions.json \
    --annotation_image_root dir/to/images/from/annotation/file \
    --class_names pupil_text pupil_comment teacher_comment \
    --bbox_scale_x 1 \
    --bbox_scale_y 1 \
    --save_dir dir/to/save/dataset \
    --output_csv_name data.csv

Training

To train the model:

python scripts/train.py --config_path path/to/the/ocr_config.json

Evaluating

To test the model:

python scripts/evaluate.py \
--config_path path/to/the/ocr_config.json \
--model_path path/to/the/model-weights.ckpt

If you want to use a beam search decoder with LM, you can pass lm_path arg with path to .arpa kenLM file. --lm_path path/to/the/language-model.arpa

ONNX

You can convert Torch model to ONNX to speed up inference on cpu.

python scripts/torch2onnx.py \
--config_path path/to/the/ocr_config.json \
--model_path path/to/the/model-weights.ckpt

About

An easy-to-run OCR model pipeline based on CRNN and CTC loss

License:MIT License


Languages

Language:Python 89.4%Language:Jupyter Notebook 8.4%Language:Dockerfile 1.5%Language:Makefile 0.7%