end-to-end text-spotting dense-text-image filmed-image

WordLenSpotter

This is the official implementation of Paper: Word Length-aware Text Spotting: Enhancing Dense Text Detection and Recognition for Camera-captured Document Image.

Preparation

Downloaded images

The dense text spotting dataset (DSTD1500) in real reading scenarios can be downloaded here.
Sample dataset images

You can also prepare your custom dataset following the example scripts. [example scripts]
To evaluate DSTD1500, first download the zipped annotations.

Models

WordLenSpotter-MIXTRAIN [config] | model_Google Drive

Installation

Python=3.8
PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1
OpenCV for visualization

Steps

Install the repository (we recommend to use Anaconda for installation.)

conda create -n WordLenSpotter python=3.8 -y
conda activate WordLenSpotter
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install opencv-python
pip install scipy
pip install shapely
pip install rapidfuzz
pip install timm
pip install Polygon3
git clone https://github.com/unxiaohao/WordLenSpotter.git
cd WordLenSpotter
python setup.py build develop

dataset path

datasets
|_ dstd1500
|  |_ train_images
|  |_ test_images
|  |_ dstd1500_test.json
|  |_ dstd1500_train.json
|  |_ weak_voc_new.txt
|  |_ weak_voc_pair_list.txt
|_ evaluation
|  |_ test_gt.zip

Usage

Training

Pretrain WordLenSpotter

python projects/WordLenSpotter/train_net.py \
  --num-gpus 8 \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-pretrain.yaml

Joint training model on the mixed real dataset

python projects/WordLenSpotter/train_net.py \
  --num-gpus 8 \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-mixtrain.yaml

Fine-tune

Fine-tune model

python projects/WordLenSpotter/train_net.py \
  --num-gpus 8 \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-WordLenSpotter-finetune-dstd1500.yaml

Visualize

Visualize the detection and recognition results

python demo/demo.py \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-finetune-dstd1500.yaml \
  --input input1.jpg \
  --output ./output \
  --confidence-threshold 0.4 \
  --opts MODEL.WEIGHTS ./output/FINETUNE20K/model_final.pth

The visualization results are shown in the figure:

Acknowlegement

This project is based on Adelaidet, Detectron2 and SwinTextSpotter.

Citation

If our paper helps your research, please cite it in your publications:

@article{wang2023word,
  title={Word length-aware text spotting: Enhancing detection and recognition in dense text image},
  author={Wang, Hao and Zhou, Huabing and Zhang, Yanduo and Lu, Tao and Ma, Jiayi},
  journal={arXiv preprint arXiv:2312.15690},
  year={2023}
}

About

This is the official implementation of Paper: Word length-aware text spotting: Enhancing detection and recognition in dense text image.

https://arxiv.org/abs/2312.15690

end-to-end text-spotting dense-text-image filmed-image

Apache License 2.0

Languages

Language:Python 93.1%Language:Cuda 3.8%Language:C++ 3.0%Language:CMake 0.0%