This is the official implementation of Paper: Word Length-aware Text Spotting: Enhancing Dense Text Detection and Recognition for Camera-captured Document Image.
- Downloaded images
- The dense text spotting dataset (DSTD1500) in real reading scenarios can be downloaded here.
- Sample dataset images
-
You can also prepare your custom dataset following the example scripts. [example scripts]
-
To evaluate DSTD1500, first download the zipped annotations.
WordLenSpotter-MIXTRAIN [config] | model_Google Drive
- Python=3.8
- PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1
- OpenCV for visualization
- Install the repository (we recommend to use Anaconda for installation.)
conda create -n WordLenSpotter python=3.8 -y
conda activate WordLenSpotter
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install opencv-python
pip install scipy
pip install shapely
pip install rapidfuzz
pip install timm
pip install Polygon3
git clone https://github.com/unxiaohao/WordLenSpotter.git
cd WordLenSpotter
python setup.py build develop
- dataset path
datasets
|_ dstd1500
| |_ train_images
| |_ test_images
| |_ dstd1500_test.json
| |_ dstd1500_train.json
| |_ weak_voc_new.txt
| |_ weak_voc_pair_list.txt
|_ evaluation
| |_ test_gt.zip
- Pretrain WordLenSpotter
python projects/WordLenSpotter/train_net.py \
--num-gpus 8 \
--config-file projects/WordLenSpotter/configs/WordLenSpotter-pretrain.yaml
- Joint training model on the mixed real dataset
python projects/WordLenSpotter/train_net.py \
--num-gpus 8 \
--config-file projects/WordLenSpotter/configs/WordLenSpotter-mixtrain.yaml
Fine-tune model
python projects/WordLenSpotter/train_net.py \
--num-gpus 8 \
--config-file projects/WordLenSpotter/configs/WordLenSpotter-WordLenSpotter-finetune-dstd1500.yaml
Visualize the detection and recognition results
python demo/demo.py \
--config-file projects/WordLenSpotter/configs/WordLenSpotter-finetune-dstd1500.yaml \
--input input1.jpg \
--output ./output \
--confidence-threshold 0.4 \
--opts MODEL.WEIGHTS ./output/FINETUNE20K/model_final.pth
The visualization results are shown in the figure:
This project is based on Adelaidet, Detectron2 and SwinTextSpotter.
If our paper helps your research, please cite it in your publications:
@article{wang2023word,
title={Word length-aware text spotting: Enhancing detection and recognition in dense text image},
author={Wang, Hao and Zhou, Huabing and Zhang, Yanduo and Lu, Tao and Ma, Jiayi},
journal={arXiv preprint arXiv:2312.15690},
year={2023}
}