EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

Accepted to ICDAR 2019 arxiv
Authors: He Guo, Xiameng Qin, Jiaming Liu, Junyu Han, Jingtuo Liu and Errui Ding

Abstract

This repository is designed to provide an open-source dataset for Visual Text Extraction.

The dataset can be downloaded through the following link:
baiduyun, PASSWORD: e4z1

Some details:

scenes	number	size	Google Drive link
train ticket	300k synth + 1.9 real	13G	dataset_trainticket.tar
passport	100k synth	5.8G	dataset_passport.tar
business card	200k synth	19G	dataset_business.tar.0 dataset_business.tar.1 dataset_business.tar.2 dataset_business.tar.3

[A large of training data]
Todo:
1. Use CycleGan or domain adaptation to synth data to train EATEN.
2. Introduce datasets of STR to EATEN.
[Generalization on complex scenes]
Todo:
1. Add bounding box annotations of ToIs to EATEN, such as 2019-ICCV-oral Towards Unconstrained End-to-End Text Spotting.
[Engineering]
1. Merge server decoder to one.
2. parallel decoding.

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction