gaeun0112 / level2_klue-nlp-06

level2_klue-nlp-06 created by GitHub Classroom

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

level2_klue-nlp-06

level2_klue-nlp-06 created by GitHub Classroom

🌱Members

μ„œκ°€μ€ κΉ€μ§€ν˜„ κΉ€λ―Όν˜Έ 김성은 ν™μ˜ν›ˆ

πŸ“½οΈν”„λ‘œμ νŠΈ κ°œμš”

🧢 Relation Extraction

  • 관계 μΆ”μΆœ(Relation Extraction)은 λ¬Έμž₯의 단어(Entity)에 λŒ€ν•œ 속성과 관계λ₯Ό μ˜ˆμΈ‘ν•˜λŠ” λ¬Έμ œμ΄λ‹€. 관계 μΆ”μΆœμ€ 지식 κ·Έλž˜ν”„ ꡬ좕을 μœ„ν•œ 핡심 ꡬ성 μš”μ†Œλ‘œ, κ΅¬μ‘°ν™”λœ 검색, 감정 뢄석, 질문 λ‹΅λ³€ν•˜κΈ°, μš”μ•½κ³Ό 같은 μžμ—°μ–΄μ²˜λ¦¬ μ‘μš© ν”„λ‘œκ·Έλž¨μ—μ„œ μ€‘μš”ν•˜λ‹€.
  • λŒ€νšŒμ˜ λͺ©ν‘œλŠ” λ¬Έμž₯ λ‚΄ 두 단어(entity)의 관계λ₯Ό νŒŒμ•…ν•˜μ—¬ 30κ°€μ§€μ˜ Label둜 잘 λΆ„λ₯˜ν•˜λŠ” λͺ¨λΈμ„ ν•™μŠ΅μ‹œν‚€λŠ” 것이닀.

πŸ“‡ Data

  • train.csv : 총 32470개
  • test_data.csv : 총 7765개
  • Label : 총 30개의 class

πŸ“‘ Metric

  • KLUE-RE evaluation metric을 κ·ΈλŒ€λ‘œ μ‚¬μš©ν•˜λ©°, λ‘˜ 쀑 micro F1 scoreκ°€ μš°μ„ μ‹œλ¨.
    1. no_relation classλ₯Ό μ œμ™Έν•œ micro F1 score

      $\mathrm{Recall} = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$

      $\mathrm{Precision} = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$

      $\mathrm{F1 \ score} = 2 \times\frac{\mathrm{Precision} \times \mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}$

    2. λͺ¨λ“  class에 λŒ€ν•œ area under the precision-recall curve(AUPRC)

πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ ν”„λ‘œμ νŠΈ νŒ€ ꡬ성 및 μ—­ν• 

  • κΉ€λ―Όν˜Έ : λͺ¨λΈ ꡬ쑰 및 손싀 ν•¨μˆ˜ 뢄석
  • 김성은 : main μ‹€ν–‰ μ½”λ“œ μž‘μ„±, 데이터 μ „μ²˜λ¦¬, λͺ¨λΈ μ»€μŠ€ν…€
  • κΉ€μ§€ν˜„ : μ „μ²˜λ¦¬ 방법 μ œμ‹œ, λͺ¨λΈ ꡬ쑰 뢄석, μ»€μŠ€ν…€ λͺ¨λΈ κ΅¬ν˜„, base setting κΈ°μ—¬ 및 앙상블
  • μ„œκ°€μ€ : ν•˜μ΄νΌ νŒŒλΌλ―Έν„° νŠœλ‹ 및 λ‹€μ–‘ν•œ λͺ¨λΈ μ‹€ν—˜
  • ν™μ˜ν›ˆ : μ „μ²˜λ¦¬ 방법 μ œμ‹œ 및 λͺ¨λΈ 예츑 κ²°κ³Ό 뢄석

πŸ—‚οΈ 파일 ꡬ쑰

β”œβ”€β”€ src
β”‚   β”œβ”€β”€ dict_label_to_num.pkl
β”‚   β”œβ”€β”€ dict_num_to_label.pkl
β”‚   β”œβ”€β”€ train.py
β”‚   β”œβ”€β”€ inference.py
β”‚		└── ensemble.py
β”‚		└── hp_train.py
β”‚
β”‚   
β”œβ”€β”€ utils
β”‚   β”œβ”€β”€ preprocessing.py : tokenizing μ΄μ „κΉŒμ§€μ˜ μ „μ²˜λ¦¬ ν•¨μˆ˜λ₯Ό μ €μž₯ν•˜λŠ” ν•¨μˆ˜
β”‚   β”œβ”€β”€ tokenizing.py : dataset μ΄μ „κΉŒμ§€ λ‹΄λ‹Ήν•˜λŠ” ν•¨μˆ˜λ“€ λͺ¨μ•„λ‘λŠ” κ³³
β”‚   β”œβ”€β”€ metric.py : λ©”νŠΈλ¦­ κ΄€λ ¨ ν•¨μˆ˜λ“€ λͺ¨μ•„λ‘λŠ” κ³³
β”‚   └── load_data.py : μ „μ²˜λ¦¬μ™€ 데이터셋 ꡬ성을 μœ„ν•œ ν•¨μˆ˜ μ½”λ“œ!
β”‚
β”œβ”€β”€ result
β”‚   └── {run_name} : λͺ¨λΈ κ²°κ³Ό
β”‚       └── best_model : λͺ¨λΈ μ €μž₯ν•˜λŠ” κ³³
β”‚
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ test
β”‚   β”‚   └── test_data.csv
β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”œβ”€β”€ train_original.csv
β”‚   β”‚   β”œβ”€β”€ train.csv
β”‚   β”‚   └── dev.csv
β”‚   └── prediction
β”‚       └── sample_submission.csv
β”‚
β”œβ”€β”€ main.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── config.yaml

πŸ‘€ Wrap-up Report

https://eojjeol-stones.notion.site/REPORT-09253205d8864f7c8837cee868566702

✏️ Usage

install requirements

pip install -r requirements.txt

main.py

python main.py # train, inference λͺ¨λ‘ μ‹€ν–‰
python main.py -r train # train μ‹€ν–‰
python main.py -r inference # inference μ‹€ν–‰ 

About

level2_klue-nlp-06 created by GitHub Classroom


Languages

Language:Python 100.0%