liviust/TCM

Turning a CLIP Model into a Scene Text Detector

This repository is build upon mmocr 0.4.0.

NightTime-ArT Dataset

NightTime-ArT dataset, collected from ArT, can be downloaded from here.

Usage

Environment

cuda 11.1
torch=1.8.0
torchvision=0.9.0
timm=0.4.12
mmcv-full=1.3.17
mmseg=0.20.2
mmdet=2.19.1
mmocr=0.4.0

The code is based on mmocr. Please first install the mmcv-full and mmocr following the official guidelines (mmocr).

Dataset

Please following the mmocr official guidelines to prepare the datasets accordingly.
Configure the dataset path in ocrclip/configs/_base_/det_datasets.

Pre-trained CLIP Models

Download the pre-trained CLIP models (RN50.pt) and save them to the pretrained folder.
Configure the pre-trained CLIP models path in config file as

model = dict(
    pretrained='xxx/ocrclip/pretrained/RN50.pt',
    )

Pretraining & Training & Evaluation

To pretrain the TCM model on SynthText/Synth150k, please configure the corresponding dataset path, then run:

bash dist_train.sh configs/textdet/xxnet/xxx.py 8

To finetune the TCM model based on pretrained model, please configure the load_from to the pretrained checkpoint path, then run:

bash dist_train.sh configs/textdet/xxnet/xxx.py 8

To evaluate the performance with checkpoint, run:

bash dist_test.sh configs/textdet/xxnet/xxx.py /path/to/checkpoint 1 --eval hmean-iou

Results

Method	Data	F-measure	Model
TCM-DB	TD	88.8%	config weights
TCM-DB	IC15	88.8%	config weights
TCM-DB	CTW	85.1%	config
TCM-DB	TT	85.9%	config

TODO

Add FastTCM
Migration from mmocr 0.4.0 to mmocr 1.0.0
Refactor and clean code
Release domain adaptation setting

Cites

If you find this project helpful for your research, please consider citing the paper

@inproceedings{Yu2023TurningAC,
  title={Turning a CLIP Model into a Scene Text Detector},
  author={Wenwen Yu and Yuliang Liu and Wei Hua and Deqiang Jiang and Bo Ren and Xiang Bai},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

Licence

This project is under the CC-BY-NC 4.0 license. See LICENSE for more details.

Acknowledges

The project partially based on MMOCR, CLIP, DenseCLIP. Thanks for their great works.

About

Turning a CLIP Model into a Scene Text Detector (CVPR2023)

https://arxiv.org/abs/2302.14338

Other

Languages

Language:Python 93.6%Language:Shell 6.3%Language:Batchfile 0.0%Language:Makefile 0.0%Language:Dockerfile 0.0%Language:CSS 0.0%