1MENU / Korean_ABSA_model

[๐ŸŽ–๏ธ1๋“ฑ(์žฅ๊ด€์ƒ) ์†”๋ฃจ์…˜] 2022 ๊ตญ๋ฆฝ๊ตญ์–ด์› ์ธ๊ณต ์ง€๋Šฅ ์–ธ์–ด ๋Šฅ๋ ฅ ํ‰๊ฐ€ (์‡ผํ•‘๋ชฐ ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ ์†์„ฑ ๊ธฐ๋ฐ˜ ๊ฐ์„ฑ ๋ถ„์„ : Aspect-Based Sentiment Analysis)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Korean_ABSA_model

2022 ๊ตญ๋ฆฝ๊ตญ์–ด์› ์ธ๊ณต ์ง€๋Šฅ ์–ธ์–ด ๋Šฅ๋ ฅ ํ‰๊ฐ€ (์†์„ฑ ๊ธฐ๋ฐ˜ ๊ฐ์„ฑ ๋ถ„์„ : ABSA)

[์ธ๊ณต์ง€๋Šฅ ์–ธ์–ด ๋Šฅ๋ ฅ ํ‰๊ฐ€ ๋Œ€์ƒ์— '๊ฐ€์ฒœ๋Œ€ํ•™๊ต 1์ธ 1๋ฉ”๋‰ด' ํŒ€] ๋‰ด์Šค ๊ธฐ์‚ฌ

(http://www.edujin.co.kr/news/articleView.html?idxno=40816)

๊ฒฐ๊ณผ๋ฌผ ๋ฐœํ‘œ ์˜์ƒ ๋ฐ ์‹œ์ƒ (Click)

Watch the video

Task

์†์„ฑ ๊ธฐ๋ฐ˜ ๊ฐ์„ฑ ๋ถ„์„ (ABSA)

Task ์„ค๋ช… : https://corpus.korean.go.kr/task/taskList.do?taskId=8&clCd=END_TASK&subMenuId=sub01

๊ฐ์„ฑ ๋ถ„์„์ด๋ž€ ํ™”์ž์˜ ์˜๊ฒฌ, ๊ธ/๋ถ€์ •์˜ ํƒœ๋„๊ฐ€ ๋‚˜ํƒ€๋‚˜๋Š” ๋ฌธ์žฅ์˜ ๊ฐ์„ฑ ๊ทน์„ฑ์„ ๋ถ„์„ํ•˜์—ฌ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ฃผ๋กœ ๋ฌธ์žฅ ๋‹จ์œ„๋กœ ๊ธ์ •, ๋ถ€์ •์˜ ์œ ๋ฌด ํ˜น์€ ์ •๋„๋ฅผ ํ‰๊ฐ€ํ•ด ์™”์œผ๋ฉฐ, ์ œํ’ˆ ๋ฐ ์—ฌํ–‰์˜ ๋ฆฌ๋ทฐ ๋ถ„์„, ์ถ”์ฒœ ์‹œ์Šคํ…œ์— ๋„๋ฆฌ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋Š” ์ถ”์„ธ์ด๋‹ค.

์ตœ๊ทผ ๊ฐ์„ฑ ๋ถ„์„ ์—ฐ๊ตฌ์ž๋“ค์€ ์ œํ’ˆ ํ›„๊ธฐ ๋“ฑ์— ์ „๋ฐ˜์ ์œผ๋กœ ๋‚˜ํƒ€๋‚˜๋Š” ํ•˜๋‚˜์˜ ๊ฐ์„ฑ ๊ทน์„ฑ(๊ธ์ • ๋˜๋Š” ๋ถ€์ •)์„ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์„ ๋„˜์–ด ๋” ์„ธ๋ถ€์ ์ด๊ณ  ๊ตฌ์ฒด์ ์ธ ์ˆ˜์ค€์˜ ๊ฐ์„ฑ ๊ด€๋ จ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋ฐ ๊ด€์‹ฌ์„ ๊ฐ–๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค. ์†์„ฑ ๊ธฐ๋ฐ˜ ๊ฐ์„ฑ ๋ถ„์„์€ ์–ธ์–ด ์ž๋ฃŒ์— ๋‚˜ํƒ€๋‚œ ๊ฐœ์ฒด์™€ ์†์„ฑ ์ •๋ณด๋ฅผ ๊ณ ๋ คํ•œ ๊ฐ์„ฑ ๋ถ„์„ ๋ฐฉ๋ฒ•์œผ๋กœ ์ผ๋ฐ˜์ ์ธ ๊ฐ์„ฑ ๋ถ„์„์— ๋น„ํ•ด ๋” ์„ธ๋ฐ€ํ•œ ์ •๋ณด๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์Œ์‹์  ๋„๋ฉ”์ธ์˜ ๋ฆฌ๋ทฐ โ€œ์ฝฉ๊ตญ์ˆ˜๊ฐ€ ์‹ธ์„œ ์ข‹๋‹คโ€๋ผ๋Š” ๋ฌธ์žฅ์—์„œ ์ผ๋ฐ˜์ ์ธ ๊ฐ์„ฑ ๋ถ„์„์˜ ๊ฒฐ๊ณผ๋Š” โ€œ๊ธ์ •โ€์ด ๋˜์ง€๋งŒ, ์†์„ฑ ๊ธฐ๋ฐ˜ ๊ฐ์„ฑ ๋ถ„์„์—์„œ๋Š” ๊ฐœ์ฒด:{์Œ์‹(์ฝฉ๊ตญ์ˆ˜)}, ์†์„ฑ{๊ฐ€๊ฒฉ}, ๊ฐ์„ฑ:{๊ธ์ •}์œผ๋กœ ๋” ๋งŽ๊ณ  ๊ตฌ์ฒด์ ์ธ ์ •๋ณด๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

Solution

Dataset Preprocessing

  1. train, dev data์˜ ์†์„ฑ๋ฒ”์ฃผ ๋ฐ ๊ฐ์„ฑ label ๋ถ„ํฌ๋„ (Data imbalance)

  1. ์ •์ œ๋˜์ง€ ์•Š์€ Data (๋น„๋ฌธ, ์ค„์ž„๋ง, ์˜คํƒ€, ํ…์Šคํ‹ฐ์ฝ˜ ๋“ฑ)

๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ๋„ ํฌํ•จ๋œ data๋กœ pretrain์„ ์ง„ํ–‰ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์˜คํƒ€ ๋ฐ ๋น„๋ฌธ, ์ค„์ž„๋ง์— ์œ ๋ฆฌํ•˜์ง€๋งŒ ์ด๋ชจ์ง€, ๋ผํ‹ด์–ด, ํŠน์ˆ˜๋ฌธ์ž ๋“ฑ์€ ๋‹จ์–ด์žฅ์— ํฌํ•จ๋˜์–ด์žˆ์ง€ ์•Š๋Š” kykim/electra-kor-base์˜ ํŠน์ง•์— ๋”ฐ๋ผ ์ „์ฒ˜๋ฆฌ ์ง„ํ–‰

  1. Data label์˜ ๋‚ฎ์€ ์ •ํ™•๋„
  • ์•ฝ 5,800๊ฑด ๊ฐ€๋Ÿ‰์˜ train, dev ๋ฐ์ดํ„ฐ์—์„œ ๋ผ๋ฒจ ๋ถ„๋ฅ˜๊ฐ€ ์• ๋งค๋ชจํ˜ธํ•œ ๋ฌธ์žฅ ์‚ญ์ œ (์•ฝ 15๋ฌธ์žฅ)
  • ๊ฐ™์€ ์†์„ฑ ๋ฒ”์ฃผ ๋‚ด ๋‹ค๋ฅธ ๊ฐ์„ฑ๋ถ„์„ ์ฃผ์„์ด ์žˆ๋Š” ๊ฒฝ์šฐ ์ด๋ฅผ ์ œ๊ฑฐ ํ˜น์€ ์ˆ˜์ •
  • Label Smoothing์„ ํ†ตํ•ด ๋ถ€์ •ํ™•ํ•œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ณด์ • ํšจ๊ณผ๋ฅผ ์ฃผ๊ณ , ๋ชจ๋ธ์˜ over-confident๋„ ์ค„์ž„

ACD & ASC

Aspect Category Detection (ACD) ๋ชจ๋ธ๊ณผ Aspect Sentiment Classification (ASC) ๋ชจ๋ธ์„ ์—ฐ๊ฒฐํ•˜์˜€๋‹ค. ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— transfer learning์„ ์ด์šฉํ•ด ๊ธฐ์กด์˜ pretrained model ๋ชจ๋ธ์— ๋…์ž์ ์ธ classification head๋ฅผ ์ถ”๊ฐ€ํ•œ ํ›„ fine-tuningํ•˜์—ฌ ํ•™์Šตํ•˜์˜€๋‹ค.

  1. ์ „์ฒ˜๋ฆฌ์™€ tokenizing์„ ๊ฑฐ์นœ token๋“ค์„ โ€œ[CLS] ๋ฌธ์žฅ [SEP] ์†์„ฑ๋ฒ”์ฃผ [SEP]โ€์˜ ํ˜•ํƒœ๋กœ pretrained-model input์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
  2. output layer๋“ค์˜ ๋‚ด๋ถ€ Self-attention Layer 12๊ฐœ [CLS] token๋“ค์„ attention pooling ํ•ด 256์ฐจ์›์˜ ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“ ๋‹ค.
  3. last hidden layer์˜ ์†์„ฑ๋ฒ”์ฃผ ํ† ํฐ ๋ถ€๋ถ„๋งŒ ์ถ”์ถœํ•ด 768์ฐจ์›์˜ ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“ ๋‹ค.
  4. ๊ฐ๊ฐ์˜ ๋‹ค๋ฅธ FC layer[dropout(0.1), activation function์œผ๋กœ hyperbolic tangent, Linear]๋ฅผ ๊ฑฐ์นœ๋‹ค.
  5. 2๊ฐœ์˜ ๋ฒกํ„ฐ๋ฅผ concatํ•˜์—ฌ 1024์ฐจ์›์˜ ์ƒˆ๋กœ์šด ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“ ๋‹ค.
  6. ์ด๋ฅผ ๋ถ„๋ฅ˜ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 1024->2 linear๋ฅผ ์ ์šฉํ•œ๋‹ค.

๊ฒฐ๊ณผ๊ฐ€ 0(False)์ผ์‹œ ํ•ด๋‹น ์†์„ฑ๋ฒ”์ฃผ๋Š” ์ถ”์ถœ๋˜์ง€ ์•Š๊ณ , 1(True)์ผ์‹œ ์†์„ฑ๋ฒ”์ฃผ๊ฐ€ ์ถ”์ถœ๋œ๋‹ค.

ASC ๋˜ํ•œ ๊ฐ™์€ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ์œผ๋ฉฐ, Class label์ด 0(positive), 1(negative), 2(neutral)๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ์–ด ๋ถ„๋ฅ˜๋ฒกํ„ฐ์—์„œ 1024 -> 3 linear๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ๋งŒ ์ œ์™ธํ•˜๋ฉด ์ „๋ถ€ ๋™์ผํ•˜๋‹ค.

์•„๋ž˜๋Š” ๋ณธ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ์ด๋‹ค.

Implement

  1. git clone (https://github.com/1MENU/Korean_ABSA_model.git)
  2. Download the model (https://huggingface.co/juicyjung/Korean_ABSA_model_1MENU)

huggingface์—์„œ ๊ฐ€์ ธ์˜จ ํŒŒ์ผ์„ ์••์ถ•ํ•ด์ œํ•˜์—ฌ materials/saved_model ํด๋” ์•„๋ž˜์— ๋„ฃ๊ธฐ CD ํด๋” ์•ˆ์˜ ํŒŒ์ผ์€ CD ํด๋” ์•„๋ž˜์—, SC ํด๋” ์•ˆ์˜ ํŒŒ์ผ์€ SC ํด๋” ์•„๋ž˜์— ๋„ฃ๋Š”๋‹ค.

  1. ๋ชจ๋ธ์— ๋„ฃ์„ dataset๋งˆ๋ จ
  • ์‹คํ–‰์‹œ dataset ํด๋” ์•„๋ž˜์—๋‹ค๊ฐ€ inference ํ•  ๋ฐ์ดํ„ฐ ๋„ฃ๊ธฐ
  • ์ด๋•Œ ์„ค์ •ํ•  ํŒŒ์ผ์˜ ์ด๋ฆ„์€ run_together.sh ํŒŒ์ผ ๋‚ด, โ€”test_file๊ฐ’์— ์‹คํ–‰ํ•˜๊ณ ์žํ•˜๋Š” ํŒŒ์ผ์˜ ์ด๋ฆ„๊ณผ ๊ฐ™์•„์•ผํ•œ๋‹ค. (๊ธฐ๋ณธ๊ฐ’์€ โ€œnikluge-sa-2022-test.jsonlโ€ ์ด๋‹ค)
  1. docker file๋ฅผ ์ด์šฉํ•˜์—ฌ ํ™˜๊ฒฝ ๊ตฌ์ถ•ํ•˜๊ธฐ
docker build -t gcu-1menu:1.0 . 
docker run โ€“it โ€”name team1 gcu-1menu:1.0
  1. run_together.sh ์‹คํ–‰ํ•˜๊ธฐ
bash run_together.sh 
  • ๊ฒฐ๊ณผ ํŒŒ์ผ์€ material/submission ํด๋” ์•„๋ž˜ ์ƒ์„ฑ๋œ๋‹ค. ํ˜„์žฌ final.json์ด๋ผ๋Š” ์ด๋ฆ„์œผ๋กœ ๊ฒฐ๊ณผ๊ฐ’์ด ๋‚˜์˜ค๊ฒŒ ๋˜์–ด์žˆ๋‹ค
  • ๋‹ค๋งŒ ์ฃผ์˜ํ•ด์•ผํ•  ์ ์€ ๊ฒฐ๊ณผํŒŒ์ผ์€ ๋„์ปค ๋‚ด๋ถ€ ํ™˜๊ฒฝ์—๋งŒ ๋ฐ˜์˜์ด ๋˜์–ด์žˆ์œผ๋ฏ€๋กœ ๋„์ปค ์ปจํ…Œ์ด๋„ˆ ํ™˜๊ฒฝ์—์„œ ๊ฒฐ๊ณผ ํŒŒ์ผ์„ ๊ฐ€์ ธ์˜ค๊ณ  ์‹ถ์€ ๊ฒฝ์šฐ์—๋Š” ์•„๋ž˜์˜ ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
docker cp <์ปจํ…Œ์ด๋„ˆ ์ด๋ฆ„>:<์ปจํ…Œ์ด๋„ˆ ๋‚ด๋ถ€ ํŒŒ์ผ ๊ฒฝ๋กœ> <๋ณต์‚ฌํ•  ํŒŒ์ผ ๊ฒฝ๋กœ> 

Members

Jiwoo Jung | travelandi01@gmail.com
Doyeon Hyun | 118ssun@naver.com
Seonghyun Kang | manomono0610@gmail.com
Heejin Jang | heejin00628@gmail.com
Hajeong Lee | hjpurege@gachon.ac.kr

About

[๐ŸŽ–๏ธ1๋“ฑ(์žฅ๊ด€์ƒ) ์†”๋ฃจ์…˜] 2022 ๊ตญ๋ฆฝ๊ตญ์–ด์› ์ธ๊ณต ์ง€๋Šฅ ์–ธ์–ด ๋Šฅ๋ ฅ ํ‰๊ฐ€ (์‡ผํ•‘๋ชฐ ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ ์†์„ฑ ๊ธฐ๋ฐ˜ ๊ฐ์„ฑ ๋ถ„์„ : Aspect-Based Sentiment Analysis)

License:GNU Lesser General Public License v2.1


Languages

Language:Python 97.7%Language:Shell 2.1%Language:Dockerfile 0.2%