Team sOcCeR's formular recognizer(OCR)

프로젝트 기간

2021년 05월 24일 ~ 06월 15일(4 weeks)

프로젝트 overview

All of what we experiments

Flow of what we applied

More Details

자세한 실험과정과 추가적인 내용들은 Project Details를 확인해주세요

수식 인식기 프로젝트 소개

수식인식기 프로젝트 는 수식이 적혀있는 이미지를 LaTex표현으로 바꾸는 Image to text 테스크 입니다.
수식이 적혀있는 손글씨 5만장, 인쇄물 5만장의 데이터로 학습을 진행하고 1만2천장의 수식 이미지를 LaTex로 변환하는 과제입니다.

평가 방법

0.9 * 문장 단위 정확도 + 0.1*(1 - 단어 오류율)
문장 단위 정확도(Sentence Accuracy)(%): 정답과 정확하게 일치하는 문장 갯수 / 전체 문장 갯수

- **단어 오류율**(Word Error Rate)(%): 삽입, 삭제, 대체가 필요한 횟수 / 전체 단어 수

Environment

Hardware

CPU: Xeon Gold 5120
GPU: Tesla V100 32GB
Mem: > 90GB
Data is stored in remote server storage.

Software

System: Ubuntu 18.04.4 LTS with Linux 4.4.0-210-generic kernel.
Python: 3.7 distributed by Anaconda.
CUDA: 10.1
Pytorch: 1.4.0

Dependencies

scikit_image==0.14.1
opencv_python==3.4.4.19
tqdm==4.28.1
torch==1.4.0
scipy==1.2.0
numpy==1.15.4
torchvision==0.2.1
Pillow==8.1.1
tensorboardX==1.5
editdistance==0.5.3

$ pip install -r requirements.txt

Usage

Attention, SATRN

$ python train.py --c {your_model}.yaml

Vit

$ python train_ViT.py

Swin

$ python train_swin.py

Dataset

학습이미지 예시:

Ground Truth:
x = \frac { - b \pm \sqrt { b ^ 2 - 4 a c } } { 2 a } \ { \text { when } } \ {a x ^ 2 + b x + c = 0}

File Structure

p4-fr-hatting-day/code/
│
├── configs
│    ├── Attention.yaml
│    ├── ...
│    └── EFFICIENT_SATRNv6.yaml
├── datatools
│    ├── extract_tokens.py
│    ├── parse_upstage.py
│    └── train_test_split.py
├── network
│    ├── Attention.py
│    ├── EFFICIENT_SATRN.py
│    ├── SATRN_extension.py
│    ├── ...
│    └── swin.py
├── submit
├── checkpoint.py
├── dataset.py
├── dataset_ViT.py
├── dataset_Swin.py
├── floags.py
├── inference.py
├── inference_ensemble.py
├── metrics.py
├── requirements.txt
├── requirements_2.txt
├── scheduler.py
├── submission.txt
├── train.py
├── train_ViT.py
├── train_swin.py
└── utils.py
p4-fr-hatting-day/pytorch-CycleGAN-and-pix2pix/
├── CycleGAN.ipynb # for tutorial
├── ...
└── cleaning_GAN.ipynb # for data cleansing

Models

ASTER

CNN과 LSTM으로 구성된 Encoder와 Encoder output과 전 LSTM의 hidden state를 Attention하는 모델입니다.
Scene text recognition의 기초 모델입니다.
BLSTM의 hidden state를 더하여 디코더로 넘겨주었습니다.
CNN backbone: EfficientNet V2

SATRN

ASTER와 마찬가지로 Encoder, Decoder로 구성된 모델입니다.
이미지의 수평, 수직정보의 중요도를 학습하는 A2DPE, 문자 주변 공간정보를 학습하는 Locality-aware feedforward가 특징인 모델입니다.
Multi head attention 진행시 Residual attention을 적용하여 성능 개선
Weight initialize는 RealFormer논문을 참고하였습니다.
CNN backbone: ResnetRS152, EfficientNet v2를 사용하였습니다.

ViT

이미지를 patch로 나누어 하나의 시퀀스로 transformer를 통해 학습하는 모델입니다.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (https://arxiv.org/abs/2010.11929)

Swin

위 ViT의 Patch Partition 과정은 동일하며 Shifted Window개념이 확장되어 성능향상이 이루어진 모델입니다.
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (https://arxiv.org/abs/2103.14030)

Data cleaning with GAN

How to use

Train & Test : cleaning_GAN.ipynb
Tutorial : CycleGAN.ipynb

Sample Results

Left : GAN output(Fake) / Right : Origin Image(Real)

Contributors

이동빈 (Dongbin-Lee-git)
이근재 (GJ Lee)
이정환 (JeonghwanLee1)
조영민 (joqjoq966)
김수호 (Sooho-Kim)
신문종 (moon-jong)

Reference

Papers

Supported Data

About

p4-fr-soccer created by GitHub Classroom

Languages

Language:Jupyter Notebook 72.1%Language:Python 27.6%Language:Shell 0.1%Language:MATLAB 0.1%Language:Dockerfile 0.0%

karim-moon / Img2LaTex

Team sOcCeR's formular recognizer(OCR)

목차

프로젝트 기간

프로젝트 overview

All of what we experiments

Flow of what we applied

More Details

수식 인식기 프로젝트 소개

평가 방법

Environment

Hardware

Software

Dependencies

Usage

Attention, SATRN

Vit

Swin

Dataset

File Structure

Models

ASTER

SATRN

ViT

Swin

Data cleaning with GAN

How to use

Sample Results

Contributors

Reference

Papers

Supported Data

About

Languages