This is a Pytorch implementation of TextBPN++: Arbitrary Shape Text Detection via Boundary Transformer; This project is based on TextBPN
NOTE: This paper is under review.
Please light up the stars, thank you! Your encouragement is the energy for us to constantly update!
- 2022.06.20 Updated the illustration of framework.
- 2022.06.16 Uploaded the missing file because of the naming problem(CTW1500_Text_New, Total_Text_New), which support the new label formats for Total_Text (mat) and CTW1500 (xml).
- 2022.06.16 We updated the Google cloud links so that it can be downloaded without permission.
- Release code
- scripts for training and testing
- Demo script
- Evaluation
- Release pre-trained models
- Release trained modelsn on each benchmarks
- Prepare TextSpotter codes based on TextBPN++
python >= 3.7 ;
PyTorch >= 1.7.0;
Numpy >=1.2.0;
CUDA >=11.1;
GCC >=10.0;
opencv-python < 4.5.0
NVIDIA GPU(2080 or 3080);
NOTE: We tested the code in the environment of Arch Linux+Python3.9 and Ubuntu20.04+Python3.8. In different environments, the code needs to be adjusted slightly, and there is little difference in performance (not significant, so it can be ignored). Most of our experiments are mainly in Arch Linux + Python3.9.
If DCN is used, some CUDA files need to be compiled
# make sure that the PATH of CUDA in setup.py is set properly with the right environment
# Set the GPU you need in setup.py, if If you have different GPUs in your machine.
cd network/backbone/assets/dcn
sh Makefile.sh
# setup.py
import os
PATH ="{}:{}".format(os.environ['PATH'], "/opt/cuda/bin")
# os.environ['CUDA_VISIBLE_DEVICES'] = "1"
os.environ['PATH'] = PATH
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
NOTE: The images of each dataset can be obtained from their official website.
We provide a simple example for each dataset in data, such as Total-Text, CTW-1500, and ArT ...
We provide some pre-tarining models on SynText Baidu Drive (download code: r1ff), Google Drive and MLT-2017 Baidu Drive (download code: srym), Google Drive
├── pretrain
│ ├──Syn
│ ├── TextBPN_resnet50_0.pth #1s
│ ├── TextBPN_resnet18_0.pth #4s
│ ├── TextBPN_deformable_resnet50_0.pth #1s
│ ├── MLT
│ ├── TextBPN_resnet50_300.pth #1s
│ ├── TextBPN_resnet18_300.pth #4s
│ ├── TextBPN_deformable_resnet50_300.pth #1s
NOTE: we also provide the pre-tarining scripts for SynText.
We provide training scripts for each dataset in scripts-train, such as Total-Text, CTW-1500, and ArT ...
# train_Totaltext_res50_1s.sh
#!/bin/bash
cd ../
CUDA_LAUNCH_BLOCKING=1 python3 train_textBPN.py --exp_name Totaltext --net resnet50 --scale 1 --max_epoch 660 --batch_size 12 --gpu 0 --input_size 640 --optim Adam --lr 0.001 --num_workers 30
# train_Totaltext_res50_1s_fine_mlt.sh
#!/bin/bash
cd ../
CUDA_LAUNCH_BLOCKING=1 python3 train_textBPN.py --exp_name Totaltext --net resnet50 --scale 1 --max_epoch 660 --batch_size 12 --gpu 0 --input_size 640 --optim Adam --lr 0.0001 --num_workers 30 --load_memory True --resume model/pretrain/MLT/TextBPN_resnet50_300.pth
We provide testing scripts for each dataset in scripts-eval, such as Total-Text, CTW-1500, and ArT ...
# Eval_ArT.sh
#!/bin/bash
cd ../
##################### eval for ArT with ResNet50 1s ###################################
#CUDA_LAUNCH_BLOCKING=1 python3 eval_textBPN.py --net resnet50 --scale 1 --exp_name ArT --checkepoch 605 --test_size 960 2880 --dis_threshold 0.4 --cls_threshold 0.4 --gpu 0;
##################### eval for ArT with ResNet50-DCN 1s ###################################
CUDA_LAUNCH_BLOCKING=1 python3 eval_textBPN.py --net deformable_resnet50 --scale 1 --exp_name ArT --checkepoch 480 --test_size 960 2880 --dis_threshold 0.4 --cls_threshold 0.8 --gpu 0;
##################### batch eval for ArT ###################################
#for ((i=660; i>=300; i=i-5));
#do
#CUDA_LAUNCH_BLOCKING=1 python3 eval_textBPN.py --exp_name ArT --net deformable_resnet50 --checkepoch $i --test_size 960 2880 --dis_threshold 0.45 --cls_threshold 0.8 --gpu 0;
#done
NOTE:If you want to save the visualization results, you need to open “--viz”. Here is an example:
CUDA_LAUNCH_BLOCKING=1 python3 eval_textBPN.py --net resnet18 --scale 4 --exp_name TD500 --checkepoch 1135 --test_size 640 960 --dis_threshold 0.35 --cls_threshold 0.9 --gpu 0 --viz;
You can also run prediction on your own dataset without annotations. Here is an example:
#demo.sh
#!/bin/bash
CUDA_LAUNCH_BLOCKING=1 python3 demo.py --net resnet18 --scale 4 --exp_name TD500 --checkepoch 1135 --test_size 640 960 --dis_threshold 0.35 --cls_threshold 0.9 --gpu 0 --viz --img_root /path/to/image
Note that we provide some the protocols for benchmarks (Total-Text, CTW-1500, MSRA-TD500, ICDAR2015). The embedded evaluation protocol in the code are obtatined from the official protocols. You don't need to run these protocols alone, because our test code will automatically call these scripts, please refer to "util/eval.py"
We refer to the speed testing scheme of DB. The speed is evaluated by performing a testing image for 50 times to exclude extra IO time.
CUDA_LAUNCH_BLOCKING=1 python3 eval_textBPN_speed.py --net resnet18 --scale 4 --exp_name TD500 --checkepoch 1135 --test_size 640 960 --dis_threshold 0.35 --cls_threshold 0.9 --gpu 1;
Note that the speed is related to both to the GPU and the CPU.
The results are reported in our paper as follows:
Datasets | Model | recall | precision | F-measure | FPS |
---|---|---|---|---|---|
Total-Text | Res18-4s | 81.90 | 89.88 | 85.70 | 32.5 |
Total-Text | Res50-1s | 85.34 | 91.81 | 88.46 | 13.3 |
Total-Text | Res50-1s+DCN | 87.93 | 92.44 | 90.13 | 13.2 |
CTW-1500 | Res18-4s | 81.62 | 87.55 | 84.48 | 35.3 |
CTW-1500 | Res50-1s | 83.77 | 87.30 | 85.50 | 14.1 |
CTW-1500 | Res50-1s+DCN | 84.71 | 88.34 | 86.49 | 16.5 |
MSRA-TD500 | Res18-4s | 87.46 | 92.38 | 89.85 | 38.5 |
MSRA-TD500 | Res50-1s | 85.40 | 89.23 | 87.27 | 15.2 |
MSRA-TD500 | Res50-1s+DCN | 86.77 | 93.69 | 90.10 | 15.3 |
MLT-2017 | Res50-1s | 65.67 | 80.49 | 72.33 | - |
MLT-2017 | Res50-1s+DCN | 72.10 | 83.74 | 77.48 | - |
ICDAR-ArT | Res50-1s | 71.07 | 81.14 | 75.77 | - |
ICDAR-ArT | Res50-1s+DCN | 77.05 | 84.48 | 80.59 | - |
NOTE: The results on ICDAR-ArT and MLT-2017 are aslo can be found on the official competition website (ICDAR-ArT and MLT-2017). You can also download the models for each benchmarks from Baidu Drive (download code: wct1)
Qualitative comparisons with TextRay, ABCNet, and FCENet on selected challenging samples from CTW-1500. The images (a)-(d) are borrowed from FCENet.
Please cite the related works in your publications if it helps your research:
@inproceedings{DBLP:conf/iccv/Zhang0YWY21,
author = {Shi{-}Xue Zhang and
Xiaobin Zhu and
Chun Yang and
Hongfa Wang and
Xu{-}Cheng Yin},
title = {Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection},
booktitle = {2021 {IEEE/CVF} International Conference on Computer Vision, {ICCV} 2021, Montreal, QC, Canada, October 10-17, 2021},
pages = {1285--1294},
publisher = {{IEEE}},
year = {2021},
}
@inproceedings{Zhang2022ArbitraryST,
title={Arbitrary Shape Text Detection via Boundary Transformer},
author={S. Zhang and Xiaobin Zhu and Chun Yang and Xu-Cheng Yin},
year={2022}
}
This project is licensed under the MIT License - see the LICENSE.md file for details