Adversarial Sequence-to-sequence Domain adaptation

Overview

We propose a novel Adversarial Sequence-to-sequence Domain Adaptation Network dubbed ASSDA for robust text image recognition, which could adaptively transfer coarse global-level and fine-grained character-level knowledge.

Install

This code is test in the environment with cuda==10.1, python==3.6.8.
Install Requirements

pip3 install torch==1.2.0 pillow==6.2.1 torchvision==0.4.0 lmdb nltk natsort

Dataset

The prepared synthetic and real scene dataset can be downloaded from here, which are created by NAVER Corp.
- Synthetic scene text : MJSynth (MJ) and SynthText (ST) \
- Real scene text : the union of the training sets IC13, IC15, IIIT, and SVT.\
- Benchmark evaluation scene text datasets : consist of IIIT, SVT, IC03, IC13[3], IC15, SVTP, and CUTE.
The prepared handwritten text dataset can be downloaded from here
- Handwritten text: IAM

Training and evaluation

For a toy example, you can download the pretrained model from here
- Add model files to test into data/

Training model

CUDA_VISIBLE_DEVICES=1 python train_da_global_local_selected.py --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn \
--src_train_data ./data/data_lmdb_release/training/ \
--tar_train_data ./data/IAM/test --tar_select_data IAM --tar_batch_ratio 1 --valid_data ../data/IAM/test/ \
--continue_model ./data/TPS-ResNet-BiLSTM-Attn.pth \
--batch_size 128 --lr 1 \
--experiment_name _adv_global_local_synth2iam_pc_0.1 --pc 0.1

Test model

Test the baseline model

CUDA_VISIBLE_DEVICES=0 python test.py   --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn   \
 --eval_data ./data/IAM/test \
 --saved_model ./data/TPS-ResNet-BiLSTM-Attn.pth

Test the adaptation model

CUDA_VISIBLE_DEVICES=0 python test.py   --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn   \
--eval_data ./data/IAM/test \
--saved_model saved_models/TPS-ResNet-BiLSTM-Attn-Seed1111_adv_global_local_selected/best_accuracy.pth

Citation

If you use this code for a paper please cite:

@inproceedings{zhang2019sequence,
  title={Sequence-to-sequence domain adaptation network for robust text image recognition},
  author={Zhang, Yaping and Nie, Shuai and Liu, Wenju and Xu, Xing and Zhang, Dongxiang and Shen, Heng Tao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={2740--2749},
  year={2019}
}

@article{zhang2021robust,
  title={Robust Text Image Recognition via Adversarial Sequence-to-Sequence Domain Adaptation},
  author={Zhang, Yaping and Nie, Shuai and Liang, Shan and Liu, Wenju},
  journal={IEEE Transactions on Image Processing},
  volume={30},
  pages={3922--3933},
  year={2021},
  publisher={IEEE}
}

Acknowledgement

This implementation has been based on this repository deep-text-recognition-benchmark

AprilYapingZhang / Seq2SeqAdapt