zzhenxi / Semantic-Textual-Similarity-API

πŸ“Œ ν…μŠ€νŠΈ μœ μ‚¬λ„λ₯Ό κ΅¬ν•˜λŠ” NLP ν”„λ‘œμ νŠΈ

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[κΈ°μ—…κ³Όμ œ3] ν…μŠ€νŠΈ μœ μ‚¬λ„(STS)λ₯Ό μ•Œλ €μ£ΌλŠ” API

  • λ³Έ APIλŠ” μ›ν‹°λ“œ ν”„λ¦¬μ˜¨λ³΄λ”© μ½”μŠ€μ˜ κΈ°μ—…κ³Όμ œ 3번 μˆ˜ν–‰μ„ μœ„ν•΄ μ œμž‘λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
  • model : KoELECTRA쀑 koelectra-base-v3-discriminator(pre-trained model)을 fine-tuningν•œ λͺ¨λΈμž…λ‹ˆλ‹€.
  • fine-tuningμ—λŠ” KLUE-STS 의 train dataλ₯Ό μ‚¬μš©ν•˜μ˜€μŠ΅λ‹ˆλ‹€.
  • 🌟fine-tuning νŒŒνŠΈμ— λŒ€ν•œ μ„€λͺ…은 fine-tuning.ipynb와 λ…Έμ…˜ νŽ˜μ΄μ§€λ₯Ό μ°Έκ³ ν•΄μ£Όμ„Έμš”!

Updates

March 23, 2022

  • initial commit

How to run

$ pip install -r requirements.txt
$ python main.py

μ‹€ν–‰ ν›„, http://127.0.0.1:5000/ (ν˜Ήμ€ localhost)둜 접속
λ°•μŠ€μ•ˆμ— λ¬Έμž₯ λ‘κ°œλ₯Ό μž…λ ₯ν•˜κ³  submit ν•΄μ€λ‹ˆλ‹€.

Directory structure

β”œβ”€β”€ images
β”œβ”€β”€ running_model
β”‚   β”œβ”€β”€ best_model
β”‚   β”‚   β”œβ”€β”€ config.json
β”‚   β”‚   └── pytorch_model.bin
β”‚   β”œβ”€β”€ data_preprocessing.py
β”‚   └── models.py
β”œβ”€β”€ templates
β”‚   β”œβ”€β”€ index.html
β”‚   └── result.html
β”œβ”€β”€ fine-tuning.ipynb
β”œβ”€β”€ main.py
β”œβ”€β”€ README.md
└── requirements.txt
  • running_model : fine-tuning된 best_modelκ³Ό model을 μ‹€ν–‰ν•˜λŠ”λ° ν•„μš”ν•œ λͺ¨λ“ˆμ„ ν¬ν•¨ν•©λ‹ˆλ‹€.
  • models.py : modelν•¨μˆ˜λ‘œ λͺ¨λΈμ„ μ‹€ν–‰ν•˜μ—¬ λ‘κ°œμ˜ λ¬Έμž₯에 λŒ€ν•΄ μœ μ‚¬λ„λ₯Ό κ΅¬ν•©λ‹ˆλ‹€.
  • data_preprocessing.py : 받은 λ‘κ°œμ˜ λ¬Έμž₯에 λŒ€ν•΄ μ „μ²˜λ¦¬λ₯Ό μ§„ν–‰ν•˜λŠ” λͺ¨λ“ˆμž…λ‹ˆλ‹€.
  • fine-tuning.ipynb : KoELECTRA λͺ¨λΈμ„ λΆˆλŸ¬μ™€ fine-tuning ν•˜λŠ” 과정을 담은 νŒŒμΌμž…λ‹ˆλ‹€.

Requirements

Flask==2.0.3
huggingface-hub==0.4.0
tokenizers==0.11.6
torch==1.11.0
transformers==4.17.0

Score (at KLUE sts dev set)

  • Pearson's r (0~5 continuous similarity): 0.933
  • F1 Score(for binary classification): 0.867

Our team

λ₯˜μ œμ„± μ›μž¬μ„± μž₯진희
λͺ¨λΈ 쑰사 및 데이터 μ „μ²˜λ¦¬ λͺ¨λΈ 쑰사 및 κ΅¬ν˜„, fine-tuning λͺ¨λΈ 쑰사 및 κ΅¬ν˜„, rest api κ΅¬ν˜„

About

πŸ“Œ ν…μŠ€νŠΈ μœ μ‚¬λ„λ₯Ό κ΅¬ν•˜λŠ” NLP ν”„λ‘œμ νŠΈ


Languages

Language:Jupyter Notebook 99.5%Language:Python 0.3%Language:HTML 0.2%