LIYHUI / AIFFEL_Hackathon

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

๐ŸŽ AIFFEL Hackathon๐ŸŽ 

๐ŸŽช TT ( Text Transformer ) ๐ŸŽช

A technology that converts the text inside an image to another content or language while preserving the style.

๐Ÿ“ Contents

๐Ÿ“ Description

TT๋Š” ์ด๋ฏธ์ง€ ๋‚ด๋ถ€์˜ ํ…์ŠคํŠธ๋ฅผ ๋‹ค๋ฅธ ๋‚ด์šฉ์œผ๋กœ ๋ฐ”๊พธ์–ด์ฃผ๋Š” ํ”„๋กœ์ ํŠธ๋กœ ๋‹จ์ˆœํžˆ ํ…์ŠคํŠธ๋ฅผ ๋ฐ”๊พธ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๊ธฐ์กด์˜ ํ…์ŠคํŠธ ์Šคํƒ€์ผ์„ ์œ ์น˜ํ•œ ์ฑ„ ๋ณ€ํ˜• ์‹œ์ผœ์ค€๋‹ค.

์ด ํ”„๋กœ์ ํŠธ๋ฅผ ์‹คํ–‰์‹œํ‚ค๊ธฐ ์œ„ํ•ด End-to-End ๋ฐฉ์‹์œผ๋กœ Scene Text Editing์„ ํ•ด์ฃผ๋Š” clova ai์—์„œ ์ œ์•ˆํ•œ RewriteNet์„ ์‚ฌ์šฉํ–ˆ๋‹ค.

RewriteNet ๐Ÿ“ƒ

[Network]

  • Encoder, Generator, Recognizer, Discriminator ์ด 4๊ฐœ์˜ ๋„คํŠธ์›Œํฌ๋กœ ๊ตฌ์„ฑ
    • Encoder : Pre-trained ResNet-18
      • Content Encoder : Bidirectional LSTM
    • Generator : U-Net
    • Recognizer : LSTM with Attention
    • Discriminator : Discriminator of PatchGAN
  • Training phase๋Š” ๋‘๊ฐœ๋กœ ๊ตฌ์„ฑ
    • Synthetic phase : ํ•ฉ์„ฑ ์ด๋ฏธ์ง€๋ฅผ ์ด์šฉํ•œ ํ›ˆ๋ จ์œผ๋กœ Recognizer ๋ถ€๋ถ„์ด ์žˆ์–ด์„œ ์ด๋ฏธ์ง€์—์„œ content๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ถ”์ถœํ•ด ๋‚ด๋Š”์ง€๋ฅผ ํ•™์Šต
      • Synthetic data๋Š” SynthTIGER ๋ฅผ ์ด์šฉํ•ด์„œ ์ƒ์„ฑ
    • Real phase : ์‹ค์ œ ์ด๋ฏธ์ง€๋ฅผ ์ด์šฉํ•œ ํ›ˆ๋ จ์œผ๋กœ ์ž˜๋ ค์ง„ ์ด๋ฏธ์ง€๋ฅผ ์›๋ณธ์˜ ์Šคํƒ€์ผ๊ณผ ์–ผ๋งˆ๋‚˜ ๋น„์Šทํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ๋‚ด๋Š”์ง€๋ฅผ ํ›ˆ๋ จ
  • Inference ๋‹จ๊ณ„์—์„œ๋Š” Encoder์™€ Generator๋ฅผ ์‚ฌ์šฉ

[Loss]

๐Ÿ“ Environment

Python 3.9
PyTorch 1.11

๐Ÿ“ Reference

About


Languages

Language:Jupyter Notebook 99.4%Language:Python 0.6%