PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

The unofficial code of CDistNet.

Now, we have implemented all the modules according to the papaer except for TPS in the visual branch.You can refer ASTER for the implementation of TPS.

Requirements

Python3.6.8
lmdb==0.98
torch==1.5.1
torchvision==0.6.1
Pillow==6.1.0
opencv-python==4.2.0.32
numpy==1.17.1

Data preparation

We offer you a tool to transform raw dataset to LMDB dataset. Details please refer to tools/create_lmdb_dataset.py

You can also download lmdb dataset from OCR_Dataset

Train

First you need to modify some arguments in configs/cdistnet.yml.

TrainReader set the path of train lmdb dataset.
EvalReader set the path of evaluation lmdb dataset.
Global set the args like image_shape, dict_file, etc.
VisualModule set the args of visual branch in the original paper.
PositionalEmbedding set the args of positional branch.
SemanticEmbedding set the args of semantic branch.
MDCDP set the args of MDCDP.

python train.py -c configs/cdistnet.yml

Demo

Modify these arguments below in configs/cdistnet.yml.

pretrain_weights set the path of model file path.
infer_img set the image path.
`is_train set to False.

python predict.py -c configs/cdistnet.yml

TODO

Pretrained models
Test code
Comparison with original paper on benchmarks(CUTE, IC13, IC15, IIIT5K, SVT, SVTP)

About

The pytorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

Languages

Language:Python 100.0%