yukyunglee / transformers-resources

huggingface transformers tutorial, code, resources

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

๐Ÿค—huggingface transformers resources๐Ÿค—

huggingface transformers๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด์„œ ์ •๋ฆฌํ•œ ์ž๋ฃŒ๋“ค์„ ์•„์นด์ด๋น™ ํ•ฉ๋‹ˆ๋‹ค.

# Requirements
transformers==4.5.0

(์ฐธ๊ณ : Python version์— ๋”ฐ๋ผ 4.5.0์—์„œ ์‹คํ–‰๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์—” python version์— ๋งž๋Š” transformers ์„ค์น˜ ํ•„์š”ํ•จ)

00 Introduction

1) What is Transformers?

  • Huggingface Transformer๋Š” ๋ชจ๋ธ์„ ์‰ฝ๊ณ  ๋น ๋ฅด๊ฒŒ ์ƒ์„ฑํ•˜๊ณ , ํ•™์Šตํ•˜๊ณ , ๋ฐฐํฌํ•˜๊ธฐ ์œ„ํ•ด ๋งŒ๋“ค์–ด์ง„ Highlevel Library์ž…๋‹ˆ๋‹ค
  • NLP ๋ชจ๋ธ์„ ์‰ฝ๊ณ  ๋น ๋ฅด๊ฒŒ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‹ค์–‘ํ•œ pipeline ๋ฐ ์ •ํ˜•ํ™”๋œ ์ฝ”๋“œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค
  • Transformers๋Š” Transformer๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์„ ์‰ฝ๊ณ  ๋น ๋ฅด๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค

2) NLP Modeling with transformers

  • NLP ๋ชจ๋ธ์€ ์•„๋ž˜์™€ ๊ฐ™์€ Pipeline์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค

    Input -> Tokenization -> Model training/Inference -> Post-Processing (task dependent) -> Output

  • ์ด๋Ÿฌํ•œ pipeline์„ ๋ชจ๋‘ scratch๋ถ€ํ„ฐ ๊ตฌํ˜„ํ•ด๋ณด๋Š”๊ฒƒ์€ ๋งค์šฐ ์˜๋ฏธ์žˆ๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค

  • ํ•˜์ง€๋งŒ task์— ์•Œ๋งž๋Š” ๋ชจ๋ธ์„ ๋น ๋ฅด๊ฒŒ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ฐ˜๋ณต์ ์ธ ์ž‘์—…์„ ์ •ํ˜•ํ™” ํ•  ํ•„์š”์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค

  • transformers๋ฅผ ์‚ฌ์šฉํ•จ์œผ๋กœ์„œ ๋ฐ˜๋ณต๋˜๋˜ ์ž‘์—…์„ ๊ฐ„ํŽธํ•˜๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ณ , ๋ชจ๋ธ ๊ฐœ๋ฐœ์— ์ดˆ์ ์„ ๋งž์ถ”์–ด ๊ฐœ๋ฐœ ๋ฐ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ๋ฉ๋‹ˆ๋‹ค.

3) etc.

  • ์•„๋ž˜์˜ ๋งํฌ์—์„œ ๋” ์ž์„ธํ•˜๊ณ  ๊ผผ๊ผผํ•œ ๊ณต์‹ ํŠœํ† ๋ฆฌ์–ผ์„ ๊ณต๋ถ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    https://huggingface.co/transformers/notebooks.html

  • ์•„๋ž˜์˜ ๋งํฌ์—์„œ huggingface์—์„œ ์ œ๊ณตํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ๋Œ€ํ•œ discussion์„ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    https://discuss.huggingface.co/

๐Ÿ“‚ tutorial

huggingface๋Š” ์ž์ฒด์ ์œผ๋กœ ์ƒ์„ธํ•œ document๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ๊ด€๋ จ notebook๊ณผ course๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ณด๋‹ค ์‰ฝ๊ฒŒ transformers ๋ฅผ ๋น ๋ฅด๊ฒŒ ํ›‘์–ด๋ณผ ์ˆ˜ ์žˆ๋Š” ๋‚ด์šฉ์œผ๋กœ ๊ตฌ์„ฑํ•˜์˜€์œผ๋ฉฐ, ๋ชจ๋ธ๋ง ๊ณผ์ •์—์„œ ์•Œ๊ฒŒ๋œ ์ฃผ์˜์‚ฌํ•ญ์„ ํ•จ๊ป˜ ๊ธฐ์ˆ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

01 Basic tutorial

  1. Tokenizer ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
  2. Config ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
  3. Pretrained model ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
  4. Huggingface Trainer ์‚ฌ์šฉํ•˜๊ธฐ

02 Advanced tutorial

  1. Token ์ถ”๊ฐ€ํ•˜๊ธฐ
    • special token ์ถ”๊ฐ€ํ•˜๊ธฐ
    • ์ผ๋ฐ˜ token ์ถ”๊ฐ€ํ•˜๊ธฐ
  2. [CLS] output ์ถ”์ถœํ•˜๊ธฐ
    • ์ฐธ๊ณ  : [CLS] ํ† ํฐ์€ ์ •๋ง ๋ฌธ์žฅ์„ ๋Œ€ํ‘œํ• ๊นŒ ?

๐Ÿ“‚ note

transformers๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ๋ง์„ ์ง„ํ–‰ํ•˜๋‹ค ์•Œ๊ฒŒ๋œ ์ž‘๊ณ  ์†Œ์ค‘ํ•œ ๋ฉ”๋ชจ๋“ค์„ ์ •๋ฆฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.

01 Checkpoint loading

  • checkpoint loading์‹œ ์ฃผ์˜ํ• ์ 

    • model state_dict์ผ๋ถ€๋งŒ loadingํ•˜๊ธฐ

02 Tokenizer

  • Tokenizer ์‚ฌ์šฉ์‹œ ๊ณ ๋ คํ•˜๋ฉด ์ข‹์„ ๋ช‡๊ฐ€์ง€ ํฌ์ธํŠธ
    • tokenizer.encode์™€ tokenizer.convert_tokens_to_ids

๐Ÿ“‚ modeling

transformers๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ nlp ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.

01 Named Entity Recognition
  • Trainer ์‚ฌ์šฉ, config ์ˆ˜์ •, ํ•จ์ˆ˜ ๋ถ„๋ฆฌ
  • datasets์˜ CoNLL-2003 ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ๋ง
  • eval_f1: 0.93, eval_acc: 0.98

About

huggingface transformers tutorial, code, resources


Languages

Language:Jupyter Notebook 95.7%Language:Python 4.3%