π Author : Minku Koo
π Project Period : Dec/2020 ~ Jan/2021
π Contact : corleone@kakao.com
π Main Library : tensorflow, keras, KoNLPy
π Keyword : "Sentiment Analysis", "Machine Learning", "Korean", "Deep Learning"
- Introduction
- Data Scrapping
- Data Labeling
- Data Preprocessing
- Build Deep Learning Network
- Predict Data Sentiments
- Result
- Python Crawler : ./python-code/comment_crawling.py
- Target Place : Naver, Daum News Comment
- Scrapped Data : Comment, Replay, Article Date (+ Title, Content)
- News Searching Keyword : "κΈ°λ κ΅", "λΆκ΅", "μ²μ£Όκ΅", "μ μ²μ§", "μ’ κ΅"
- Data Saved Place : Database (MariaDB)
- Database Data to Text file - path : ./comment/raw-comment/
κ²μ ν€μλ | μμ§ μμ κΈ°κ° | κΈ°μ€ λ μ§ | μμ§ μ’ λ£ κΈ°κ° |
---|---|---|---|
μ μ²μ§ | 19.09.17 | 20.02.17 | 20.07.18 |
κΈ°λ κ΅ | 19.08.20 | 20.01.20 | 20.10.20 |
μ²μ£Όκ΅ | 19.08.20 | 20.01.20 | 20.08.20 |
λΆκ΅ | 19.08.20 | 20.01.20 | 20.08.20 |
μ’ κ΅ | 19.08.20 | 20.01.20 | 20.10.10 |
κ²μ ν€μλ | μ΄μ κΈ°κ° | μ΄ν κΈ°κ° | ||
---|---|---|---|---|
Article | Comment | Article | Comment | |
μ μ²μ§ | 211 | 22,658 | 2,974 | 262,840 |
κΈ°λ κ΅ | 1,771 | 94,405 | 1,186 | 85,443 |
μ²μ£Όκ΅ | 1,899 | 37,010 | 1,685 | 56,881 |
λΆκ΅ | 833 | 6,465 | 420 | 7,585 |
μ’ κ΅ | 1,939 | 52,527 | 2,373 | 122,206 |
- path : ./train-data/
- Comment Human Inspection : ./train-data/comment-labeling.csv
- Naver Movie Review Data : naver-ratings.csv
- ( Data from Here )
okt.pos(comment)
remove 'Josa', 'Punctuation', 'Number'
save path : ./comment/after-okt-comment/
- Python File Name : ./python-code/make_rnn_model.py
- Train Data path : ./train-data/
- Crawled Comment + Naver Movie Reivew => Transfer Learning
- Comment text data convert to Vector (using TextVectorization)
- Accuracy : 0.95
- Val Accuracy : 0.83
- Make json file -> dict[date][article] = [[comment list],[]]
- Every Comment Labeling using Deep Learning Model
- Update json file / dict[date][article] = [[comment list],[sentiment value list]] (path: ./comment/json-okt-comment)
- Calculate sentiment value per date
- each Article sentiment : Weight Average (article comment count / date comment count)
- each Date sentiment : using IMDb's rating system
κ²μ ν€μλ | μ΄μ κΈ°κ° | μ΄ν κΈ°κ° | ||
---|---|---|---|---|
νκ· | νμ€ νΈμ°¨ | νκ· | νμ€ νΈμ°¨ | |
μ μ²μ§ | 0.381 | 0.412 | 0.313 | 0.388 |
κΈ°λ κ΅ | 0.310 | 0.372 | 0.276 | 0.371 |
μ²μ£Όκ΅ | 0.375 | 0.405 | 0.284 | 0.377 |
λΆκ΅ | 0.356 | 0.392 | 0.272 | 0.369 |
μ’ κ΅ | 0.313 | 0.376 | 0.271 | 0.367 |
(path : ./result-graph/emotion-average-stick/)
(path : ./result-graph/emotion-flow/)
- Before COVID19 : green
- After COVID19 : red
- y axis
- close to 1 : Positive
- close to 0 : Negative
β μ²μ£Όκ΅
(path : ./result-graph/comment-count/)
(path : ./result-graph/word-cloud/)
β Before COVID19, κΈ°λ
κ΅
β After COVID19, κΈ°λ
κ΅
(path : ./result-graph/word-cloud/)