unxiaohao / WordLenSpotter

This is the official implementation of Paper: Word length-aware text spotting: Enhancing detection and recognition in dense text image.

Home Page:https://arxiv.org/abs/2312.15690

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WordLenSpotter

This is the official implementation of Paper: Word Length-aware Text Spotting: Enhancing Dense Text Detection and Recognition for Camera-captured Document Image.

Preparation

  1. Downloaded images
  • The dense text spotting dataset (DSTD1500) in real reading scenarios can be downloaded here.
  • Sample dataset images
  1. You can also prepare your custom dataset following the example scripts. [example scripts]

  2. To evaluate DSTD1500, first download the zipped annotations.

Models

WordLenSpotter-MIXTRAIN [config] | model_Google Drive

Installation

  • Python=3.8
  • PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1
  • OpenCV for visualization

Steps

  1. Install the repository (we recommend to use Anaconda for installation.)
conda create -n WordLenSpotter python=3.8 -y
conda activate WordLenSpotter
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install opencv-python
pip install scipy
pip install shapely
pip install rapidfuzz
pip install timm
pip install Polygon3
git clone https://github.com/unxiaohao/WordLenSpotter.git
cd WordLenSpotter
python setup.py build develop
  1. dataset path
datasets
|_ dstd1500
|  |_ train_images
|  |_ test_images
|  |_ dstd1500_test.json
|  |_ dstd1500_train.json
|  |_ weak_voc_new.txt
|  |_ weak_voc_pair_list.txt
|_ evaluation
|  |_ test_gt.zip

Usage

Training

  1. Pretrain WordLenSpotter
python projects/WordLenSpotter/train_net.py \
  --num-gpus 8 \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-pretrain.yaml
  1. Joint training model on the mixed real dataset
python projects/WordLenSpotter/train_net.py \
  --num-gpus 8 \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-mixtrain.yaml

Fine-tune

Fine-tune model

python projects/WordLenSpotter/train_net.py \
  --num-gpus 8 \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-WordLenSpotter-finetune-dstd1500.yaml

Visualize

Visualize the detection and recognition results

python demo/demo.py \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-finetune-dstd1500.yaml \
  --input input1.jpg \
  --output ./output \
  --confidence-threshold 0.4 \
  --opts MODEL.WEIGHTS ./output/FINETUNE20K/model_final.pth

The visualization results are shown in the figure:

Acknowlegement

This project is based on Adelaidet, Detectron2 and SwinTextSpotter.

Citation

If our paper helps your research, please cite it in your publications:

@article{wang2023word,
  title={Word length-aware text spotting: Enhancing detection and recognition in dense text image},
  author={Wang, Hao and Zhou, Huabing and Zhang, Yanduo and Lu, Tao and Ma, Jiayi},
  journal={arXiv preprint arXiv:2312.15690},
  year={2023}
}

About

This is the official implementation of Paper: Word length-aware text spotting: Enhancing detection and recognition in dense text image.

https://arxiv.org/abs/2312.15690

License:Apache License 2.0


Languages

Language:Python 93.1%Language:Cuda 3.8%Language:C++ 3.0%Language:CMake 0.0%